Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort...

42
Sorting • Importance of sorting • Quicksort • Lower bounds for comparison- based methods • Heapsort • Non-comparison based sorting
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort...

Page 1: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Sorting

• Importance of sorting

• Quicksort

• Lower bounds for comparison-based methods

• Heapsort

• Non-comparison based sorting

Page 2: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Why don't CS profs ever stop talking about sorting?!

• Computers spend more time sorting than anything else, historically 25% on mainframes.

• Sorting is the best studied problem in computer science, with a variety of different algorithms known.

• Most of the interesting ideas we encounter in the course are taught in the context of sorting, such as divide-and-conquer, randomized algorithms, and lower bounds.

You should have seen most of the algorithms - we willconcentrate on the analysis

Page 3: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Applications of Sorting

• Closest Pair• Element Uniqueness• Frequency Distribution• Selection of Kth largest element• Convex Hulls

–See next slide!

Page 4: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Convex Hulls

Page 5: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Huffman Codes

If you are trying to minimize the amount of space a text file is taking up, it is silly to assign each letter the same length (i.e. one byte) code.

Example: e is more common than q, a is more common than z.

If we were storing English text, we would want a and e to have shorter codes than q and z.

Page 6: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Example Problemsa. You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay.

b. You are given a list containing the title, author, call number and publisher of all the books in a school library and another list of 30 publishers. Find out how many of the books in the library were published by each of those 30 companies.

c. You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

Page 7: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

QuicksortAlthough mergesort is O( n log n ), it is difficult to implement on arrays since we need space to merge. In practice, Quicksort is the fastest sorting algorithm.

Example: Pivot about 10

17 12 6 23 19 8 5 10 - before

6 8 5 10 17 12 23 19 - after

The pivot point is now in the correctly sorted position, and all other numbers are in the relative correct position, before or after.

Page 8: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Quicksort Walkthrough

17 12 6 23 19 8 5 106 8 5 10 17 12 23 195 6 8 17 12 19 23 6 8 12 17 23 6 17

5 6 8 10 12 17 19 23

Page 9: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Pseudocode

Sort(A) { Quicksort(A,1,n);}

Quicksort(A, low, high) { if (low < high) { pivotLocation = Partition(A,low,high); Quicksort(A,low, pivotLocation - 1); Quicksort(A, pivotLocation+1, high); }}

Page 10: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Pseudocodeint Partition(A,low,high) { pivot = A[high]; leftwall = low-1; for i = low to high-1 { if (A[i] < pivot) then { leftwall = leftwall+1; swap(A[i],A[leftwall]); } swap(A[high],A[leftwall+1]); } return leftwall+1;}

Page 11: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Best Case for Quicksort

Page 12: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Worst Case for Quicksort

Page 13: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Intuition: The Average Case

0 n/4 n/2 3n/4 n

Anywhere in the middle half is a decent partition

(3/4)h n = 1 => n = (4/3)h

log(n) = h log(4/3)

h = log(n) / log(4/3) < 2 log(n)

Page 14: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

What have we shown?

At most 2log(n) decent partitions suffices to sort an array of n elements.

But if we just take arbitrary pivot points, how often will they, in fact, be decent?

Since any number ranked between n/4 and 3n/4 would make a decent pivot, we get one half the time on average.

Therefore, on average we will need 2 x 2log(n) = 4log(n) partitions to guarantee sorting.

Page 15: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Quicksort in the real world…

Page 16: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Average-case Analysis

• Let X denote the random variable that represents the total number of comparisons performed

• Let Xij = probability that the ith smallest element and jth smallest element are compared

• E[X] = i=1 to n-1 j=i+1 to n Xij

Page 17: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Computing Xij

• Observation– All comparisons are between a pivot element

and another element– If an item k is chosen as pivot where i < k < j,

then items i and j will not be compared

• Xij = 2/(j-i+1)

– Items i or j must be chosen as a pivot before any items in interval (i..j)

Page 18: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Computing E[X]

E[X] = i=1 to n-1 j=i+1 to n 2/(j-i+1)

= i=1 to n-1 j=i+1 to n 2/(j-i+1)

= i=1 to n-1 k=1 to n-i 2/(k+1)

<= i=1 to n-1 2 Hn-i+1

<= i=1 to n-1 2 Hn

= 2 (n-1)Hn

Page 19: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Avoiding worst-case

• Understanding quicksort’s worst-case

• Methods for avoiding it– Pivot strategies– Randomization

Page 20: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Understanding the worst case

A B D F H J KA B D F H JA B D F HA B D FA B D A BA

The worst case occur is a likely case for many applications.

Page 21: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Pivot Strategies

• Use the middle Element of the sub-array as the pivot.

• Use the median element of (first, middle, last) to make sure to avoid any kind of pre-sorting.

What is the worst-case performance for these pivot selection mechanisms?

Page 22: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Randomization Techniques

• Make chance of worst-case run time equally small for all inputs

• Methods– Choose pivot element randomly from range

[low..high]– Initially permute the array

Page 23: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Is Quicksort really faster than Mergesort?

Since Quicksort is (n log n) and Selection Sort is (n2), there isn’t any debate about which is faster.

How can we compare two (n log n) algorithms to know which one is faster?

Using the RAM model and the big Oh notation, we can't!

If all of the algorithms are well implemented, Quicksort is at least 2-3 times faster than any of the others, but this only has to do with implementation details.

Page 24: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Possible reasons for not choosing quicksort

• What do you know about the input data?

•Is the data already partially sorted?

• Do we know the distribution of the keys?

• Are your keys very long or hard to compare?

• Is the range of possible keys very small?

Page 25: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Optimizing Quicksort

Using randomization: guarantees never to never have worst-case time due to bad data.

Median of three: Can be slightly faster than randomization for somewhat sorted data.

Leave small sub-arrays for insertion sort: Insertion sort can be faster, in practice, for small values of n.

Do the smaller partition first: minimize runtime memory.

Page 26: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Is Linear Sorting Possible?

Any comparison-based sorting program can be thought of as defining a decision tree of possible executions.

Page 27: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Example Decision Tree

Page 28: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

How big is the decision tree?Since different permutations of n elements requires a different sequence of steps to sort, there must be at least n! different paths from the root to leaves in the decision tree, ie. at least n! different leaves in the tree.

Since a binary tree of height h has at most 2h leaves, we know that n! 2h, or h log(n!)

By inspection, n! > (n/2)n/2 since the last n/2 elements of the product are greater than n/2. Thus h > (n/2)log(n/2)

Page 29: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Heaps

• Definition

• Operations– Insertion– Heap construction– Heap extract max

• Heapsort

Page 30: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

DefinitionA binary heap is defined to be a binary tree with a key in each node such that:

1: All leaves are on, at most, two adjacent levels.

2: All leaves on the lowest level occur to the left, and all levels except the lowest one are completely filled.

3: The key in root is greater than all its children, and the left and right subtrees are again binary heaps.

Conditions 1 and 2 specify shape of the tree, and condition 3 the labeling of the tree.

Page 31: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Example Heap

Page 32: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Are these legal?

Page 33: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Partial Order PropertyThe ancestor relation in a heap defines a partial order on its elements, which means it is reflexive, anti-symmetric, and transitive.

Reflexive: x is an ancestor of itself.

Anti-symmetric: if x is an ancestor of y and y is an ancestor of x, then x=y.

Transitive: if x is an ancestor of y and y is an ancestor of z, x is an ancestor of z.

Partial orders can be used to model hierarchies with incomplete information or equal-valued elements.

Page 34: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Insertion Operation

nhh

i

i

122 1

0

nh log

•Heaps can be constructed incrementally, by inserting new elements into the left-most open spot in the array.

•If the new element is greater than its parent, swap their positions and recur.

The height h of an n element heap is bounded because:

so, and insertions take O(log n) time

Page 35: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Heap Construction

The bottom up insertion algorithm gives a good way to build a heap, but Robert Floyd found a better way, using a merge procedure called heapify.

Given two heaps and a fresh element, they can be merged into one by making the new entry the root and trickling down.

To convert an array of integers into a heap, place them all into a binary tree, and call heapify on each node.

How long would this take?

Page 36: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Heapify Example

Try to create a heap with the entries:

5, 3, 17, 10, 84, 19, 6, 22, 9

Page 37: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Heap Extract Max

if heap-size(A) < 1

then error “Heap Underflow”;

max = A[1];

A[1] = A[heap-size(A)];

heap-size(A)--;

Heapify(A, 1);

return max;

Page 38: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Heap Sort

To sort using the heap data structure, we first build the heap, and then just repeatedly extract the maximum.

Build Heap = O(n)

Extract Maximum = O(log n)

Therefore:

Heap Sort = O(n) + n O(log n) = O(n log n)

Page 39: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Non-comparison Based Sorting

All the sorting algorithms we have seen assume binary comparisons as the basic primitive, questions of the form “is x before y?”.

Suppose you were given a deck of playing cards to sort. Most likely you would set up 13 piles and put all cards with the same number in one pile.

A 2 3 4 5 6 7 8 9 10 J Q K

Page 40: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

BucketsortSuppose we are sorting n numbers from 1 to m, where we know the numbers are approximately uniformly distributed.

We can set up n buckets, each responsible for an interval of m/n numbers from 1 to m

1 m/n m/n+1 2m/n 2m/n+1 3m/n … … …

Page 41: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

BucketsortWe can use bucketsort effectively whenever we understand the distribution of the data.

However, bad things happen when we assume the wrong distribution.

1 m/n m/n+1 2m/n 2m/n+1 3m/n … … …

Page 42: Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting.

Real World Distributions

Consider the distribution of names in a telephone book.

• Will there be a lot of Ofria’s?

• Will there be a lot of Smith’s?

• Will there be a lot of Zucker’s?

Make sure you understand your data, or use a good worst-case or randomized algorithm!