Skiena algorithm 2007 lecture07 heapsort priority queues

35
Lecture 7: Heapsort / Priority Queues Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.sunysb.edu/skiena

Transcript of Skiena algorithm 2007 lecture07 heapsort priority queues

Page 1: Skiena algorithm 2007 lecture07 heapsort priority queues

Lecture 7:Heapsort / Priority Queues

Steven Skiena

Department of Computer ScienceState University of New YorkStony Brook, NY 11794–4400

http://www.cs.sunysb.edu/∼skiena

Page 2: Skiena algorithm 2007 lecture07 heapsort priority queues

Problem of the Day

Take as input a sequence of 2n real numbers. Design anO(n log n) algorithm that partitions the numbers into n pairs,with the property that the partition minimizes the maximumsum of a pair.For example, say we are given the numbers (1,3,5,9).The possible partitions are ((1,3),(5,9)), ((1,5),(3,9)), and((1,9),(3,5)). The pair sums for these partitions are (4,14),(6,12), and (10,8). Thus the third partition has 10 asits maximum sum, which is the minimum over the threepartitions.

Page 3: Skiena algorithm 2007 lecture07 heapsort priority queues

Solution

Page 4: Skiena algorithm 2007 lecture07 heapsort priority queues

Importance of Sorting

Why don’t CS profs ever stop talking about sorting?

1. Computers spend more time sorting than anything else,historically 25% on mainframes.

2. Sorting is the best studied problem in computer science,with a variety of different algorithms known.

3. Most of the interesting ideas we will encounter in thecourse can be taught in the context of sorting, such asdivide-and-conquer, randomized algorithms, and lowerbounds.

You should have seen most of the algorithms - we willconcentrate on the analysis.

Page 5: Skiena algorithm 2007 lecture07 heapsort priority queues

Efficiency of Sorting

Sorting is important because that once a set of items is sorted,many other problems become easy.Further, using O(n log n) sorting algorithms leads naturally tosub-quadratic algorithms for these problems.

n n2/4 n lg n10 25 33

100 2,500 6641,000 250,000 9,965

10,000 25,000,000 132,877100,000 2,500,000,000 1,660,960

Large-scale data processing would be impossible if sortingtook Ω(n2) time.

Page 6: Skiena algorithm 2007 lecture07 heapsort priority queues

Application of Sorting: Searching

Binary search lets you test whether an item is in a dictionaryin O(lg n) time.Search preprocessing is perhaps the single most importantapplication of sorting.

Page 7: Skiena algorithm 2007 lecture07 heapsort priority queues

Application of Sorting: Closest pair

Given n numbers, find the pair which are closest to each other.Once the numbers are sorted, the closest pair will be next toeach other in sorted order, so an O(n) linear scan completesthe job.

Page 8: Skiena algorithm 2007 lecture07 heapsort priority queues

Application of Sorting: Element Uniqueness

Given a set of n items, are they all unique or are there anyduplicates?Sort them and do a linear scan to check all adjacent pairs.This is a special case of closest pair above.

Page 9: Skiena algorithm 2007 lecture07 heapsort priority queues

Application of Sorting: Mode

Given a set of n items, which element occurs the largestnumber of times? More generally, compute the frequencydistribution.Sort them and do a linear scan to measure the length of alladjacent runs.The number of instances of k in a sorted array can be found inO(log n) time by using binary search to look for the positionsof both k − ε and k + ε.

Page 10: Skiena algorithm 2007 lecture07 heapsort priority queues

Application of Sorting: Median and Selection

What is the kth largest item in the set?Once the keys are placed in sorted order in an array, the kthlargest can be found in constant time by simply looking in thekth position of the array.There is a linear time algorithm for this problem, but the ideacomes from partial sorting.

Page 11: Skiena algorithm 2007 lecture07 heapsort priority queues

Application of Sorting: Convex hulls

Given n points in two dimensions, find the smallest areapolygon which contains them all.

The convex hull is like a rubber band stretched over thepoints.Convex hulls are the most important building block for moresophisticated geometric algorithms.

Page 12: Skiena algorithm 2007 lecture07 heapsort priority queues

Finding Convex Hulls

Once you have the points sorted by x-coordinate, they can beinserted from left to right into the hull, since the rightmostpoint is always on the boundary.

Sorting eliminates the need check whether points are insidethe current hull.Adding a new point might cause others to be deleted.

Page 13: Skiena algorithm 2007 lecture07 heapsort priority queues

Pragmatics of Sorting: Comparison Functions

Alphabetizing is the sorting of text strings.Libraries have very complete and complicated rules con-cerning the relative collating sequence of characters andpunctuation.Is Skiena the same key as skiena?Is Brown-Williams before or after Brown America before orafter Brown, John?Explicitly controlling the order of keys is the job of thecomparison function we apply to each pair of elements.This is how you resolve the question of increasing ordecreasing order.

Page 14: Skiena algorithm 2007 lecture07 heapsort priority queues

Pragmatics of Sorting: Equal Elements

Elements with equal key values will all bunch together in anytotal order, but sometimes the relative order among these keysmatters.Sorting algorithms that always leave equal items in the samerelative order as in the original permutation are called stable.Unfortunately very few fast algorithms are stable, butStability can be achieved by adding the initial position as asecondary key.

Page 15: Skiena algorithm 2007 lecture07 heapsort priority queues

Pragmatics of Sorting: Library Functions

Any reasonable programming language has a built-in sortroutine as a library function.You are almost always better off using the system sort thanwriting your own routine.For example, the standard library for C contains the functionqsort for sorting:

void qsort(void *base, size t nel, size t width,int (*compare) (const void *, const void *));

Page 16: Skiena algorithm 2007 lecture07 heapsort priority queues

Selection Sort

Selection sort scans throught the entire array, repeatedlyfinding the smallest remaining element.

For i = 1 to nA: Find the smallest of the first n − i + 1 items.B: Pull it out of the array and put it first.

Selection sort takes O(n(T (A) + T (B)) time.

Page 17: Skiena algorithm 2007 lecture07 heapsort priority queues

The Data Structure Matters

Using arrays or unsorted linked lists as the data structure,operation A takes O(n) time and operation B takes O(1), foran O(n2) selection sort.Using balanced search trees or heaps, both of these operationscan be done within O(lg n) time, for an O(n log n) selectionsort, balancing the work and achieving a better tradeoff.Key question: “Can we use a different data structure?”

Page 18: Skiena algorithm 2007 lecture07 heapsort priority queues

Heap Definition

A binary heap is defined to be a binary tree with a key in eachnode such that:

1. All leaves are on, at most, two adjacent levels.

2. All leaves on the lowest level occur to the left, and alllevels except the lowest one are completely filled.

3. The key in root is ≤ all its children, and the left and rightsubtrees are again binary heaps.

Conditions 1 and 2 specify shape of the tree, and condition 3the labeling of the tree.

Page 19: Skiena algorithm 2007 lecture07 heapsort priority queues

Binary Heaps

Heaps maintain a partial order on the set of elements whichis weaker than the sorted order (so it can be efficient tomaintain) yet stronger than random order (so the minimumelement can be quickly identified).

4

6

7

1

2

5

10

3

8

9

1941

2001

1918

1963

1804

1945

1865

1492

1783

1776

1783

2001 1941

1865

1918

1492

1804

1776

1945 1963

A heap-labeled tree of important years (l), with the corre-sponding implicit heap representation (r)

Page 20: Skiena algorithm 2007 lecture07 heapsort priority queues

Array-Based Heaps

The most natural representation of this binary tree wouldinvolve storing each key in a node with pointers to its twochildren.However, we can store a tree as an array of keys, usiingthe position of the keys to implicitly satisfy the role of thepointers.The left child of k sits in position 2k and the right child in2k + 1.The parent of k is in position bn/2c.

Page 21: Skiena algorithm 2007 lecture07 heapsort priority queues

Can we Implicitly Represent Any Binary Tree?

The implicit representation is only efficient if the tree issparse, meaning that the number of nodes n < 2h.All missing internal nodes still take up space in our structure.This is why we insist on heaps as being as balanced/full ateach level as possible.The array-based representation is also not as flexible toarbitrary modifications as a pointer-based tree.

Page 22: Skiena algorithm 2007 lecture07 heapsort priority queues

Constructing Heaps

Heaps can be constructed incrementally, by inserting newelements into the left-most open spot in the array.If the new element is greater than its parent, swap theirpositions and recur.Since all but the last level is always filled, the height h of ann element heap is bounded because:

h∑

i=12i = 2h+1 − 1 ≥ n

so h = blg nc.Doing n such insertions takes Θ(n log n), since the last n/2insertions require O(log n) time each.

Page 23: Skiena algorithm 2007 lecture07 heapsort priority queues

Heap Insertion

pq insert(priority queue *q, item type x)

if (q->n >= PQ SIZE)printf(”Warning: overflow insert”);

else q->n = (q->n) + 1;q->q[ q->n ] = x;bubble up(q, q->n);

Page 24: Skiena algorithm 2007 lecture07 heapsort priority queues

Bubble Up

bubble up(priority queue *q, int p)

if (pq parent(p) == -1) return;

if (q->q[pq parent(p)] > q->q[p]) pq swap(q,p,pq parent(p));bubble up(q, pq parent(p));

Page 25: Skiena algorithm 2007 lecture07 heapsort priority queues

Bubble Down or Heapify

Robert Floyd found a better way to build a heap, using amerge procedure called heapify.Given two heaps and a fresh element, they can be merged intoone by making the new one the root and bubbling down.

Build-heap(A)n = |A|For i = bn/2c to 1 do

Heapify(A,i)

Page 26: Skiena algorithm 2007 lecture07 heapsort priority queues

Bubble Down Implementation

bubble down(priority queue *q, int p)

int c; (* child index *)int i; (* counter *)int min index; (* index of lightest child *)

c = pq young child(p);min index = p;

for (i=0; i<=1; i++)if ((c+i) <= q->n)

if (q->q[min index] > q->q[c+i]) min index = c+i;

if (min index ! = p) pq swap(q,p,min index);bubble down(q, min index);

Page 27: Skiena algorithm 2007 lecture07 heapsort priority queues

Exact Analysis of Heapify

In fact, heapify performs better than O(n log n), because mostof the heaps we merge are extremely small.It follows exactly the same analysis as with dynamic arrays(Chapter 3).In a full binary tree on n nodes, there are at most dn/2h+1enodes of height h, so the cost of building a heap is:

blg nc∑

h=0dn/2h+1eO(h) = O(n

blg nc∑

h=0h/2h)

Since this sum is not quite a geometric series, we can’t applythe usual identity to get the sum. But it should be clear thatthe series converges.

Page 28: Skiena algorithm 2007 lecture07 heapsort priority queues

Proof of Convergence (*)

The identify for the sum of a geometric series is∞∑

k=0xk =

1

1 − x

If we take the derivative of both sides, . . .∞∑

k=0kxk−1 =

1

(1 − x)2

Multiplying both sides of the equation by x gives:∞∑

k=0kxk =

x

(1 − x)2

Substituting x = 1/2 gives a sum of 2, so Build-heap uses atmost 2n comparisons and thus linear time.

Page 29: Skiena algorithm 2007 lecture07 heapsort priority queues

Is our Analysis Tight?

“Are we doing a careful analysis? Might our algorithm befaster than it seems?”Doing at most x operations of at most y time each takes totaltime O(xy).However, if we overestimate too much, our bound may not beas tight as it should be!

Page 30: Skiena algorithm 2007 lecture07 heapsort priority queues

Heapsort

Heapify can be used to construct a heap, using the observationthat an isolated element forms a heap of size 1.

Heapsort(A)Build-heap(A)for i = n to 1 do

swap(A[1],A[i])n = n − 1Heapify(A,1)

Exchanging the maximum element with the last elementand calling heapify repeatedly gives an O(n lg n) sortingalgorithm. Why is it not O(n)?

Page 31: Skiena algorithm 2007 lecture07 heapsort priority queues

Priority Queues

Priority queues are data structures which provide extraflexibility over sorting.This is important because jobs often enter a system atarbitrary intervals. It is more cost-effective to insert a newjob into a priority queue than to re-sort everything on eachnew arrival.

Page 32: Skiena algorithm 2007 lecture07 heapsort priority queues

Priority Queue Operations

The basic priority queue supports three primary operations:

• Insert(Q,x): Given an item x with key k, insert it into thepriority queue Q.

• Find-Minimum(Q) or Find-Maximum(Q): Return apointer to the item whose key value is smaller (larger)than any other key in the priority queue Q.

• Delete-Minimum(Q) or Delete-Maximum(Q) – Removethe item from the priority queue Q whose key is minimum(maximum).

Each of these operations can be easily supported using heapsor balanced binary trees in O(log n).

Page 33: Skiena algorithm 2007 lecture07 heapsort priority queues

Applications of Priority Queues: Dating

What data structure should be used to suggest who to ask outnext for a date?It needs to support retrieval by desirability, not name.Desirability changes (up or down), so you can re-insert themax with the new score after each date.New people you meet get inserted with your observeddesirability level.There is never a reason to delete anyone until they arise to thetop.

Page 34: Skiena algorithm 2007 lecture07 heapsort priority queues

Applications of Priority Queues: Discrete EventSimulations

In simulations of airports, parking lots, and jai-alai – priorityqueues can be used to maintain who goes next.The stack and queue orders are just special cases of orderings.In real life, certain people cut in line.

Page 35: Skiena algorithm 2007 lecture07 heapsort priority queues

Applications of Priority Queues: GreedyAlgorithms

In greedy algorithms, we always pick the next thing whichlocally maximizes our score. By placing all the things in apriority queue and pulling them off in order, we can improveperformance over linear search or sorting, particularly if theweights change.War Story: sequential strips in triangulations