Heaps and Heapsort

69
Heaps and Heapsort Prof. Sin-Min Lee Department of Computer Science San Jose State University

description

Heaps and Heapsort. Lecture 20. Prof. Sin-Min Lee Department of Computer Science San Jose State University. Heaps. Heap Definition. Adding a Node. Removing a Node. Array Implementation. Analysis Source Code. What is Heap?. - PowerPoint PPT Presentation

Transcript of Heaps and Heapsort

Page 1: Heaps and Heapsort

Heaps and HeapsortHeaps and Heapsort

Prof. Sin-Min Lee Department of Computer

Science San Jose State University

Page 2: Heaps and Heapsort

• Heap Definition.

• Adding a Node.

• Removing a Node.

• Array Implementation.

• Analysis

• Source Code.

HeapsHeaps

Page 3: Heaps and Heapsort

What is Heap?

A heap is like a binary search tree. It is a binary tree where the six comparison operators form a total order semantics that can be used to compare the node’s entries. But the arrangement of the elements in a heap follows some new rules that are different from a binary search tree.

Page 4: Heaps and Heapsort

HeapsHeaps

A heap is a certain kind of complete binary tree. It is defined by the following properties.

Page 5: Heaps and Heapsort

Heaps (Priority Queues)

A heap is a data structure which supports efficient implementation of three operations: insert, findMin, and deleteMin. (Such a heap is a min-heap, to be more precise.)

Two obvious implementations are based on an array (assuming the size n is known) and on a linked list. The comparisons of their relative time complexity are as follows:

insert findMin deleteMin Sorted array O(n) O(1) O(1) (from large (to move (find at end) (delete the end) to small) elements)

Linked list O(1) O(1) O(n) (append) (a pointer to Min) (reset Min pointer)

Page 6: Heaps and Heapsort

HeapsHeaps

1. All leaf nodes occur at adjacent levels.

When a completebinary tree is built,

its first node must bethe root.

When a completebinary tree is built,

its first node must bethe root.

Root

Page 7: Heaps and Heapsort

HeapsHeaps

2. All levels of the tree, except for the bottom level are completely filled. All nodes on the bottom level occur as far left as possible.

Left childof theroot

The second node isalways the left child

of the root.

The second node isalways the left child

of the root.

Page 8: Heaps and Heapsort

HeapsHeaps

3. The key of the root node is greater than or equal to the key of each child. Each subtree is also a heap.

Right childof the

root

The third node isalways the right child

of the root.

The third node isalways the right child

of the root.

Page 9: Heaps and Heapsort

HeapsHeaps

The next nodesalways fill the next

level from left-to-right..

The next nodesalways fill the next

level from left-to-right..

Page 10: Heaps and Heapsort

HeapsHeaps

The next nodesalways fill the next

level from left-to-right.

The next nodesalways fill the next

level from left-to-right.

Page 11: Heaps and Heapsort

HeapsHeaps

Complete binary tree.

Page 12: Heaps and Heapsort

Heap Storage Rules• A heap is a binary tree where the entries of the

nodes can be compared with a total order semantics. In addition, these two rules are followed:

• The entry contained by a node is greater than or equal to the entries of the node’s children.

• The tree is a complete binary tree, so that every level except the deepest must contain as many nodes as possible; and at the deepest level, all the nodes are as far left as possible.

Page 13: Heaps and Heapsort

HeapsHeaps

A heap is a certain kind of complete binary tree. 19

4222127

23

45

35

The "heap property"requires that each

node's key is >= thekeys of its children

The "heap property"requires that each

node's key is >= thekeys of its children

Page 14: Heaps and Heapsort

HeapsHeaps

This tree is not a heap. It breaks rule number 1.

19

27

23

45

35

Page 15: Heaps and Heapsort

HeapsHeaps

This tree is not a heap. It breaks rule number 2.

21

23

45

35

Page 16: Heaps and Heapsort

HeapsHeaps

This tree is not a heap. It breaks rule number 3.

2135

23

45

27

Page 17: Heaps and Heapsort

Heap implementation of priority queue

• Each node of the heap contains one entry along with the entry’s priority, and the tree is maintained so that it follows the heap storage rules using the entry’s priorities to compare nodes. Therefore:

• The entry contained by a node has a priority that is greater than or equal to the priority of the entries of the nodes children.

• The tree is a complete binary tree.

Page 18: Heaps and Heapsort

Adding a Node to a HeapAdding a Node to a Heap

Put the new node in the next available spot.

19

4222127

23

45

35

42

Page 19: Heaps and Heapsort

Adding a Node to a HeapAdding a Node to a Heap

Push the new node upward, swapping with its parent until the new node reaches an acceptable location.

19

4222142

23

45

35

27

Page 20: Heaps and Heapsort

Adding a Node to a HeapAdding a Node to a Heap

Push the new node upward, swapping with its parent until the new node reaches an acceptable location.

19

4222135

23

45

42

27

Page 21: Heaps and Heapsort

Adding a Node to a HeapAdding a Node to a Heap

The parent has a key that is >= new node, or

The node reaches the root. The process of pushing the

new node upward is called reheapification upward.

19

4222135

23

45

42

27

Page 22: Heaps and Heapsort

Removing the Top of a HeapRemoving the Top of a Heap

Move the last node onto the root.

19

4222135

23

45

42

27

Page 23: Heaps and Heapsort

Removing the Top of a HeapRemoving the Top of a Heap

Move the last node onto the root.

19

4222135

23

27

42

Page 24: Heaps and Heapsort

Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-

place node downward, swapping with its larger child until the new node reaches an acceptable location.

19

4222135

23

27

42

Page 25: Heaps and Heapsort

Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-

place node downward, swapping with its larger child until the new node reaches an acceptable location.

19

4222135

23

42

27

Page 26: Heaps and Heapsort

Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-

place node downward, swapping with its larger child until the new node reaches an acceptable location.

19

4222127

23

42

35

Page 27: Heaps and Heapsort

Removing the Top of a HeapRemoving the Top of a Heap The children all have keys <= the out-of-place node, or

The node reaches the leaf.

The process of pushing the new node downward is called reheapification downward.

19

4222127

23

42

35

Page 28: Heaps and Heapsort

Adding an Entry to a Heap

519

27 21 422

2335

45

As an example, suppose that we already have nine entries that are arranged in a heap with the above priorities.

Page 29: Heaps and Heapsort

Suppose that we are adding a new entry with a priority 42.The first stepis to add this entry in a way that keeps the binary tree complete. In thiscase the new entry will be the left child of the entry with priority 21.

519

27 21 422

2335

45

4242

Page 30: Heaps and Heapsort

The structure is no longer a heap, since the node with priority 21 has a child with a higher priority. The new entry (with priority 42) rises upwarduntil it reaches an acceptable location. This is accomplished by swappingthe new entry with its parent.

519

27 4242 422

2335

45

21

Page 31: Heaps and Heapsort

The new entry is still bigger than its parent, so a second swap is done.

519

27 35 422

234242

45

21

Page 32: Heaps and Heapsort

Adding an entry to a priority queue

Pseudocode for Adding an entry

The priority queue has been implemented as a heap.

1.Place the new entry in the heap in the first available location.This keeps the structure as a complete binary tree, but it might no longer be a heap since the new entry might have a higher priority than its parent.

2. while (the new entry has a priority that is higher than its parent)

swap the new entry with its parent.

Page 33: Heaps and Heapsort

Reheapification upward

Now the new entry stops rising, because its parent (with priority 45) has a higher prioritythan the new entry. In general, the “new entry rising” stops when the new entry reaches the root. The rising process is called reheapification upward.

Page 34: Heaps and Heapsort

Implementing a HeapImplementing a Heap

• We will store the data from the nodes in a partially-filled array.

An array of dataAn array of data

2127

23

42

35

Page 35: Heaps and Heapsort

Implementing a HeapImplementing a Heap

• Data from the root goes in the first location of the array.

An array of dataAn array of data

2127

23

42

35

42

Page 36: Heaps and Heapsort

Implementing a HeapImplementing a Heap

• Data from the next row goes in the next two array locations.

An array of dataAn array of data

2127

23

42

35

42 35 23

Page 37: Heaps and Heapsort

Implementing a HeapImplementing a Heap

• Data from the next row goes in the next two array locations.

• Any node’s two children reside at indexes (2n) and (2n + 1) in the array. An array of An array of datadata

2127

23

42

35

42 35 23 27 21

Page 38: Heaps and Heapsort

Implementing a HeapImplementing a Heap

• Data from the next row goes in the next two array locations.

An array of dataAn array of data

2127

23

42

35

42 35 23 27 21

We don't care what's inWe don't care what's inthis part of the array.this part of the array.

Page 39: Heaps and Heapsort

Important Points about the Implementation

Important Points about the Implementation

• The links between the tree's nodes are not actually stored as pointers, or in any other way.

• The only way we "know" that "the array is a tree" is from the way we manipulate the data.

An array of dataAn array of data

2127

23

42

35

42 35 23 27 21

Page 40: Heaps and Heapsort

Important Points about the Implementation

Important Points about the Implementation

• If you know the index of a node, then it is easy to figure out the indexes of that node's parent and children.

[1][1] [2] [3] [4] [5]

2127

23

42

35

42 35 23 27 21

Page 41: Heaps and Heapsort

Removing an entry from a Heap

• When an entry is removed from a priority, we must always remove the

• entry with the highest priority - the entry that stands “on top of heap”.

• For example, the entry at the root, with priority 45 will be removed.

Page 42: Heaps and Heapsort

The highest priority item, at the root, will be removed from the heap

519

27 35 422

2342

45

21

4545

Page 43: Heaps and Heapsort

Removing an entry from a heap

• During the removal, we must ensure that the resulting structure remains a heap. If the root entry is the only entry in the heap, then there is really no more work to do except to decrement the member variable that is keeping track of the size of the heap. But if there are other entries in the tree, then the tree must be rearranged because a heap is not allowed to run around without a root. The rearrangement begins by moving the last entry in the last level upto the root, like this:

Page 44: Heaps and Heapsort

519

35 422

2342

45

2121

27

2121

422

23

35

42

27

519

Before the moveAfter the move

The last entry in the last levelhas been moved to the root.

Removing an entry from a Heap

Page 45: Heaps and Heapsort

Heapsort

• Heapsort combines the time efficiency of mergesort and the storage efficiency of quicksort. Like mergesort, heapsort has a worst case running time that is O(nlogn), and like quicksort it does not require an additional array. Heapsort works by first transforming the array to be sorted into a heap.

Page 46: Heaps and Heapsort

Heap representation

A heap is a complete binary tree, and an efficient method of representing a complete binary tree in an array follows these rules:

Data from the root of the complete binary tree goes in the [0] component of the array.

The data from depth one nodes is placed in the next two components of the array.

We continue in this way, placing the four nodes of depth two in the next four components of the array, and so forth. For example, a heap with ten nodes can be stored in a ten element array as shown below:

Page 47: Heaps and Heapsort

27

45

42

23 352221

5419

45 27 42 21 23 22 35 19 4 5

45

Page 48: Heaps and Heapsort

Rules for location of data in the array

• The data from the root always appears in the [0] component of the array.

• Suppose that the data for a root node appears in the component [i] of the array. Then the data for its parent is always at location [(i - 1) / 2] (using integer division).

Page 49: Heaps and Heapsort

• Suppose that the data for a node appears in component [i] of the array. Then its children (if they exist)always have their data at these locations:

Left child at component [2i + 1]

Right child at component [2i + 2]

Page 50: Heaps and Heapsort

• The largest value in a heap is stored in the root and that in our array representation of a complete binary tree, the root node is stored in the first array position. Thus, since the array represents a heap, we know that the largest array element is in the first array position. To get the largest value in the correct array position, we simply interchange the first and the final array elements. After interchanging these two array elements, we know that the largest array element is in the final array position, as shown below:

Page 51: Heaps and Heapsort

• The dark vertical line in the array separates the sorted side of the array from the unsorted side (on the left) . Moreover the left side still represents a complete binary tree which is almost a heap.

5 27 42 21 23 22 35 19 4 45

Page 52: Heaps and Heapsort

When we consider the unsorted side as a complete binary tree, it is only the root that violates the heapstorage rule. This root is smaller than its children. The next step of the heapsort is to reposition this out of place value in order to restore the heap. The process called reheapification downward begins by comparing the value in the root with the value of each of its children. If one or both children are larger than the root , then the root’s value is exchanged with the larger of the two children. This moves the troublesome value down one level in the tree. For example:

Page 53: Heaps and Heapsort

27

22

419

21 23 35

5

42

42 27 5 21 23 22 35 19 4 45

Page 54: Heaps and Heapsort

• In the example 5 is still out of place, so we will once again exchange it with its largest child resulting in the array and heap shown below:

27

22

419

21 23

35

5

42

42 27 35 21 23 22 5 19 4 45

Page 55: Heaps and Heapsort

The unsorted side of the array must be maintained a heap. When the unsorted side of the array is once again a heap, the heapsort continues by exchanging the largest element in the unsorted side with the rightmost element of the unsorted side. For example 42 is exchanged with 4 as shown below:

4 27 35 21 23 22 5 19 42 45

Page 56: Heaps and Heapsort

• The sorted side now has two largest elements, and the unsorted side is once again almost a heap, as shown below:

4 27 35 21 23 22 5 19 42 45

522

35

4

27

2321

19

Only the root (4) is out of place, and that may be fixed by another reheapification downward. After the reheapification downward, the largest value of the unsorted side will once again reside at location [0]and we can continue to pull out the next largest value.

Page 57: Heaps and Heapsort

Pseudocode for the heapsort algorithm//Heapsort for the array called data with n elements1. Convert the array of n elements into a heap.2. unsorted = n; // The number of elements in the unsorted

side3. while ( unsorted > 1){ // Reduce the unsorted side by one unsorted - -; Swap data[0] with data [unsorted]. The unsorted side of the array is now a heap with the

root out of place. Do a reheapification downward to turn the unsorted side back into a heap. }

Page 58: Heaps and Heapsort

• While the Heapsort is a constant factor slower than the Quicksort for average data sets, it still has a complexity of O(n log n) for best, worst and average data.

• Reheapification is the process by which an out of place node is exchanged with its parent (upward) or its children (downward) until the node’s key is <= to it’s parent and >= to it’s children. Then reheapification is complete.

Analysis Analysis

Page 59: Heaps and Heapsort

A simpler but related problem: Find the smallest and the second smallest elements in an array (of size n):

• A straightforward method:

Find the smallest using n –1 comparisons; swap it to the end of the array, find the next smallest in n –2 comparisons, for a total of 2n – 3 comparisons.• A method that “saves” results of some earlier comparisons:

(1) Divide the array into two halves, find the smallest elements of each half, call them min1 and min2, respectively; (2) Compare min1 and min2, to find the smallest;

(3) If min1 is the smallest, compare min2 with the remaining elements of the first half to determine the second smallest; similarly for the case if min2 is the smallest.

The number of comparisons is (n –1) + (n/2 –1) = (3n/2 –2.

Page 60: Heaps and Heapsort

The second method can be depicted in the following figure:

min1 min2

min

First half of array Second half of array

In the figure, a link connects a smaller element to a larger one going downward; for example, min min1 and min min2, and min1 each element in the first half of the array, similarly for min2. This organization facilitates finding the smallest and second smallest elements.

H1 H2

Page 61: Heaps and Heapsort

A binary tree data structure for Min-heaps:A binary tree is called a left-complete binary tree if

(1) the tree is full at each level except possibly at the maximum level (depth), where the tree is full at level i means there are 2i nodes at that level (recall the root is at level 0); and

(2) at the tree’s maximum level h, if there are fewer than 2h nodes then the missing nodes are on the right side of the tree.

Missing on the right side

If there are n nodes in a left-complete binary tree of depth h, then 2h n 2h+1 –1. Thus, h lg n < (h +1). In particular, h = O(lg n).

Page 62: Heaps and Heapsort

A binary tree satisfies the (min-)heap-order property if the value at each is less than or equal to the value at each of the child nodes (if exist).

A binary min-heap is a left-complete binary tree that also satisfies the heap-order property.

13

21

24 31

16

19 68

65 26 32

Note that there is no definite order between values in sibling nodes; also, it is obvious that the minimum value is at the root.

A binary min-heap

Page 63: Heaps and Heapsort

Two basic operations on binary min-heaps:

(1)Suppose the value at a node is decreased which causes violation to the heap-order property (because the value is smaller than that of its parent’s). The operation to restore the heap property is named percolateUp (siftUp).

(2)Suppose the value at a node is increased which becomes larger than either (or both) of the value of the children’s. The operation to restore the heap order is named percolateDown (siftDown).

Both operations can be completed in time proportional to the tree height, thus, of time complexity O(lg n) where n is the total number of nodes. These operations are crucial in supporting the heap operations insert and deleteMin.

Page 64: Heaps and Heapsort

The percolateUp (siftUp) operation:

13

21 35

25 37 43 12

The heap order violated at node valued 12

21

25 37 43

13

12

35

percolateUp

21

25 37 43 35

13

12

percolateUpNote that the time of each step (each level) is constant, i.e., compare with parent and swap

Page 65: Heaps and Heapsort

The percolateDown (siftDown) operation:

43

21 35

37 25 45 63

The heap order violated at node valued 43

Each step of percolateDown finds the smaller of the two children (if exist) then swap it with the violating node

35

37 25 45 63

43

21

Who is smaller ?

Who is smaller ?

35

37 45 63

21

25

43

percolateDown

percolate-Down

Page 66: Heaps and Heapsort

Implementation of the heap operations:

insert(): insert a node next to the last node, call percolateUp from it, treating the new node as a violation.

deleteMin(): delete the root node, move the last node to the root, then call percolateDown from the root.

findMin(): return the root’s value.

Finally, a binary heap can be implemented using an array storing the heap elements from the root towards the leaves, level by level and from left to right within each level. The left-completeness property guarantees that there are no “holes” in the array. Also, if we use array indexes 1..n to store n elements, the parent’s index (if exists) of node i is i/2, the left child’s index is 2i, the right child 2i + 1 (if exist). The time complexity of deleteMin and insert both are O(lg n); the time complexity of findMin is O(1).

Page 67: Heaps and Heapsort

An array implementation of a binary heap:

1

2 3

4 5 6

Number the nodes (starting at 1) by levels, from top to bottom and left to right within level

1 2 3 4 5 6 . . . . .

Level 2 nodesLevel 1 nodesRoot

Parent to children (index i to 2i, 2i+1)

Page 68: Heaps and Heapsort

Convert an array T[1..n] into a heap (known as heapify):

A top-down method: repeatedly call insert() by inserting T[i] into a heap T[1..(i –1)], for i = 2 to n.

21

12 9

13 7

Initially, T[1] by itself ia a heap

12

21 9

13 7

Insert 12 into T[1..1], making T[1..2] a heap

9

21 12

13 7

9

13 12

21 7

7

9 12

21 13

Insert 9 into T[1..2], making T[1..3] a heap

Insert 13 into T[1..3], making T[1..4} a heap

Insert 7 into T[1..4], making T[1..5] a heap

The total time complexity of top-down heapify is O(n lg n).

Page 69: Heaps and Heapsort

A bottom-up heapify method:

12 9

13 7

21For i = n/2 down to 1

/* percolate down T[i], making the subtree rooted at i a heap */

call percolateDown(i)n = 5, start at node n/2, the node of the highest index that has any children

Complexity analysis: Node1 travels down h levels during percolateDown, nodes 2 and 3 each (h –1) levels, nodes 4 through 7 each (h –2) levels, etc. The total number of levels traveled is h + 2(h –1) + 4 (h –2) + …+ 2h –1(h – (h – 1)) = 2h –1(1 + 2(1/2) + 3(1/2)2 + 4(1/2)3 + …) 2n, because 2h –1 < n/2 since h lg n, and the infinite series 1 + 2x + 3x2 + … = 1/(1 –x)2 when |x| < 1. Thus, the time complexity of heapify is O(n).