Sorting Algorithms. Motivation Example: Phone Book Searching Example: Phone Book Searching If the...

Post on 18-Dec-2015

222 views 0 download

Tags:

Transcript of Sorting Algorithms. Motivation Example: Phone Book Searching Example: Phone Book Searching If the...

Sorting AlgorithmsSorting Algorithms

MotivationMotivation

Example: Phone Book SearchingExample: Phone Book Searching If the phone book was in random order, If the phone book was in random order,

we would probably never use the we would probably never use the phone!phone! Let’s say ½ second per entryLet’s say ½ second per entry There are 70,000 households in IlamThere are 70,000 households in Ilam 35,000 seconds = 10hrs to find a phone 35,000 seconds = 10hrs to find a phone

number!number! Best time: ½ secondBest time: ½ second average time is about 5 hrsaverage time is about 5 hrs

MotivationMotivation

The phone book is sorted:The phone book is sorted: Jump directly to the letter of the Jump directly to the letter of the

alphabet we are interested in usingalphabet we are interested in using Scan quickly to find the first two letters Scan quickly to find the first two letters

that are really close to the name we are that are really close to the name we are interested ininterested in

Flip whole pages at a time if not close Flip whole pages at a time if not close enoughenough

The Big IdeaThe Big Idea Take a set of N randomly ordered pieces of data aTake a set of N randomly ordered pieces of data ajj

and rearrange data such that for all j (j >= 0 and j and rearrange data such that for all j (j >= 0 and j < N), R holds, for relational operator R:< N), R holds, for relational operator R:

aa0 0 R aR a11 R a R a22 R … a R … ajj … R a … R aN-1N-1 R a R aNN

If R is <=, we are doing an If R is <=, we are doing an ascending ascending sort – Each sort – Each consecutive item in the list is going to be larger consecutive item in the list is going to be larger than the previousthan the previous

If R is >=, we are doing a If R is >=, we are doing a descendingdescending sort – sort – Items get smaller as move down the listItems get smaller as move down the list

Queue Example: Radix Queue Example: Radix SortSort

Also called bin sort:Also called bin sort:Repeatedly shuffle data into small binsRepeatedly shuffle data into small binsCollect data from bins into new deckCollect data from bins into new deckRepeat until sortedRepeat until sorted

Appropriate method of shuffling and Appropriate method of shuffling and collecting?collecting?For integers, key is to shuffle data into bins For integers, key is to shuffle data into bins on a per digit basis, starting with the on a per digit basis, starting with the rightmost (ones digit)rightmost (ones digit)Collect in order, from bin 0 to bin 9, and Collect in order, from bin 0 to bin 9, and left to right within a binleft to right within a bin

Radix Sort: Ones DigitRadix Sort: Ones Digit

Data: 459 254 472 534 649 239 432 654 Data: 459 254 472 534 649 239 432 654 477477Bin 0 Bin 0 Bin 1Bin 1Bin 2 472 432 Bin 2 472 432 Bin 3Bin 3Bin 4 254 534 654Bin 4 254 534 654Bin 5Bin 5Bin 6Bin 6Bin 7 477Bin 7 477Bin 8Bin 8Bin 9 459 649 239Bin 9 459 649 239

After Call: 472 432 254 534 654 477 459 649 239After Call: 472 432 254 534 654 477 459 649 239

Radix Sort: Tens DigitRadix Sort: Tens Digit

Data: 472 432 254 534 654 477 459 649 Data: 472 432 254 534 654 477 459 649 239239Bin 0 Bin 0 Bin 1Bin 1Bin 2Bin 2Bin 3 432 534 239Bin 3 432 534 239Bin 4 649Bin 4 649Bin 5 254 654 459Bin 5 254 654 459Bin 6Bin 6Bin 7 472 477Bin 7 472 477Bin 8Bin 8Bin 9Bin 9

After Call: 432 534 239 649 254 654 459 472 477After Call: 432 534 239 649 254 654 459 472 477

Radix Sort: Hundreds Radix Sort: Hundreds DigitDigit

Data: 432 534 239 649 254 654 459 472 477Data: 432 534 239 649 254 654 459 472 477Bin 0 Bin 0 Bin 1Bin 1Bin 2 239 254Bin 2 239 254Bin 3 Bin 3 Bin 4 432 459 472 477Bin 4 432 459 472 477Bin 5 534Bin 5 534Bin 6 649 654Bin 6 649 654Bin 7 Bin 7 Bin 8Bin 8Bin 9Bin 9

Final Sorted Data: 239 254 432 459 472 477 Final Sorted Data: 239 254 432 459 472 477 534 649 654534 649 654

Radix Sort AlgorithmRadix Sort Algorithm

Begin with current digit as one’s digitBegin with current digit as one’s digitWhile there is still a digit on which to classifyWhile there is still a digit on which to classify{{

For each number in the master list, For each number in the master list, Add that number to the appropriate sublist Add that number to the appropriate sublist

keyed on the current digitkeyed on the current digit

For each sublist from 0 to 9For each sublist from 0 to 9For each number in the sublistFor each number in the sublist

Remove the number from the sublist Remove the number from the sublist and append to a new master listand append to a new master list

Advance the current digit one place to the left.Advance the current digit one place to the left.}}

Radix Sort and QueuesRadix Sort and Queues

Each list (the master list (all items) Each list (the master list (all items) and bins (per digit)) needs to be first and bins (per digit)) needs to be first in, first out ordered – perfect for a in, first out ordered – perfect for a queue.queue.

A Quick TangentA Quick Tangent

How fast have the sorts you’ve seen How fast have the sorts you’ve seen before worked?before worked? Bubble, Insertion, Selection: O(n^2)Bubble, Insertion, Selection: O(n^2)

We will see sorts that are better, and We will see sorts that are better, and in fact optimal for general sorting in fact optimal for general sorting algorithms:algorithms: Merge/Quicksort: O(n log n)Merge/Quicksort: O(n log n)

How fast is radix sort?How fast is radix sort?

Analysis of Radix SortAnalysis of Radix Sort

Let n be the number of items to sortLet n be the number of items to sort Outer loop control is on maximum Outer loop control is on maximum

length of input numbers in digits (Let length of input numbers in digits (Let this be d)this be d)

For every digit,For every digit, Assign each number to sort to a group (n Assign each number to sort to a group (n

operations)operations) Pull each number back into the master list Pull each number back into the master list

(n operations)(n operations) Overall running time: 2 * n * d => O(n)Overall running time: 2 * n * d => O(n)

Analysis of Radix SortAnalysis of Radix Sort

O(n log n) is optimal for general sorting O(n log n) is optimal for general sorting algorithmsalgorithms

Radix sort is O(n)? How does that work?Radix sort is O(n)? How does that work?

Radix sort is not a general sorting algorithm Radix sort is not a general sorting algorithm – It can’t sort arbitrary information – – It can’t sort arbitrary information – Rectangles objects, Automobiles objects, etc Rectangles objects, Automobiles objects, etc are no good.are no good. Can sort items that can be broken into constituent Can sort items that can be broken into constituent

pieces and whose pieces can be orderedpieces and whose pieces can be ordered Integers (digits), Strings (characters)Integers (digits), Strings (characters)

Sorting AlgorithmsSorting Algorithms

What does sorting really require?What does sorting really require? CompareCompare pieces of data at different pieces of data at different

positionspositions SwapSwap the data at those positions until the data at those positions until

order is correctorder is correct

2020 33 1818 99 55

202033 55 99 1818

Selection SortSelection Sortvoid selectionSort(int* a, int size)void selectionSort(int* a, int size){{

for (int k = 0; k < size-1; k++)for (int k = 0; k < size-1; k++){{

int index = mininumIndex(a, k, size);int index = mininumIndex(a, k, size);swap(a[k],a[index]);swap(a[k],a[index]);

}}}}

int minimumIndex(int* a, int first, int last)int minimumIndex(int* a, int first, int last){{

int minIndex = first;int minIndex = first;for (int j = first + 1; j < last; j++)for (int j = first + 1; j < last; j++){ if (a[j] < a[minIndex]) minIndex = j; }{ if (a[j] < a[minIndex]) minIndex = j; }return minIndex;return minIndex;

}}

Selection SortSelection Sort

What is selection sort doing?What is selection sort doing? RepeatedlyRepeatedly

Finding smallest element by searching Finding smallest element by searching through listthrough list

Inserting at front of listInserting at front of list Moving “front of list” forward by 1Moving “front of list” forward by 1

Selection Sort Step Selection Sort Step ThroughThrough

minIndex(a, 0, 5) ? =1

swap (a[0],a[1])

2020 33 1818 99 55

202033 1818 99 55

Order FromPrevious

Find minIndex(a, 1, 5) =4

Find minIndex(a, 2, 5) = 3

202033 1818 99 55

5533 1818 99 2020

5533 1818 99 2020

5533 99 1818 2020

Find minIndex(a, 3, 5) = 3

K = 4 = size-1Done!

5533 99 1818 2020

5533 99 1818 2020

5533 99 1818 2020

Cost of Selection SortCost of Selection Sortvoid selectionSort(int* a, int size)void selectionSort(int* a, int size){{

for (int k = 0; k < size-1; k++)for (int k = 0; k < size-1; k++){{

int index = mininumIndex(a, k, size);int index = mininumIndex(a, k, size);swap(a[k],a[index]);swap(a[k],a[index]);

}}}}

int minimumIndex(int* a, int first, int last)int minimumIndex(int* a, int first, int last){{

int minIndex = first;int minIndex = first;for (int j = first + 1; j < last; j++)for (int j = first + 1; j < last; j++){ if (a[j] < a[minIndex]) minIndex = j; }{ if (a[j] < a[minIndex]) minIndex = j; }return minIndex;return minIndex;

}}

Cost of Selection SortCost of Selection Sort How many times through outer loop?How many times through outer loop?

Iteration is for k = 0 to < (N-1)Iteration is for k = 0 to < (N-1) => N-1 times=> N-1 times How many comparisons in minIndex?How many comparisons in minIndex?

Depends on outer loop – Consider 5 elements:Depends on outer loop – Consider 5 elements: K = 0 j = 1,2,3,4K = 0 j = 1,2,3,4 K = 1 j = 2, 3, 4K = 1 j = 2, 3, 4 K = 2 j = 3, 4K = 2 j = 3, 4 K = 3 j = 4K = 3 j = 4

Total comparisons is equal to 4 + 3 + 2 + 1, Total comparisons is equal to 4 + 3 + 2 + 1, which is N-1 + N-2 + N-3 … + 1which is N-1 + N-2 + N-3 … + 1

What is that sum?What is that sum?

Cost of Selection SortCost of Selection Sort

(N-1) + (N-2) + (N-3) + … + 3 + 2 + 1(N-1) + (N-2) + (N-3) + … + 3 + 2 + 1

(N-1) + 1 + (N-2) + 2 + (N-3) + 3 …(N-1) + 1 + (N-2) + 2 + (N-3) + 3 …

N + N + N … => repeated addition of N N + N + N … => repeated addition of N

How many repeated additions?How many repeated additions?

There were n-1 total starting objects to add, we There were n-1 total starting objects to add, we grouped every 2 together – approximately N/2 grouped every 2 together – approximately N/2 repeated additionsrepeated additions

=> Approximately N * N/2 = O(N^2) => Approximately N * N/2 = O(N^2) comparisonscomparisons

Insertion SortInsertion Sortvoid insertionSort(int* a, int size)void insertionSort(int* a, int size){{

for (int k = 1; k < size; k++)for (int k = 1; k < size; k++){{

int temp = a[k];int temp = a[k];int position = k;int position = k;

while (position > 0 && a[position-1] > temp)while (position > 0 && a[position-1] > temp){{

a[position] = a[position-1];a[position] = a[position-1];position--;position--;

}}a[position] = temp;a[position] = temp;

}}}}

Insertion SortInsertion Sort

List of size 1 (first element) is already List of size 1 (first element) is already sortedsorted

RepeatedlyRepeatedly Chooses new item to place in list (a[k])Chooses new item to place in list (a[k]) Starting at back of the list, if new item is less Starting at back of the list, if new item is less

than item at current position, shift current than item at current position, shift current data right by 1.data right by 1.

Repeat shifting until new item is not less than Repeat shifting until new item is not less than thing in front of it.thing in front of it.

Insert the new itemInsert the new item

33 1818

Insertion Sort Step Insertion Sort Step ThroughThrough

Single card listalready sorted

A[0] A[1] A[2] A[3] A[4]

A[0] A[1]A[2] A[3] A[4]

Move 3 leftuntil hitssomethingsmaller

202099 55

2020 33 1818 99 55

A[0] A[1]A[2] A[3] A[4]

Move 3 leftuntil hitssomethingsmaller

Now twosorted

A[0] A[1] A[2] A[3] A[4]

Move 18 leftuntil hitssomethingsmaller

1818 99 5533 2020

2020 1818 99 5533

A[0] A[1] A[2] A[3] A[4]

Move 18 leftuntil hitssomethingsmaller

Now three sorted

A[0] A[1] A[2] A[3] A[4]

Move 9 leftuntil hitssomethingsmaller

33 1818 2020 99 55

33 1818 2020 99 55

A[0] A[1] A[2] A[3] A[4]

Move 9 leftuntil hitssomethingsmaller

Now foursorted

A[0] A[1] A[2] A[3] A[4]

Move 5 leftuntil hitssomethingsmaller33 99 1818 22

00

33 99 1818 2200

55

55

A[0] A[1] A[2] A[3] A[4]

Move 5 leftuntil hitssomethingsmaller

Now allfive sorted

Done

33 99 1818 202055

Cost of Insertion SortCost of Insertion Sortvoid insertionSort(int* a, int size)void insertionSort(int* a, int size){{

for (int k = 1; k < size; k++)for (int k = 1; k < size; k++){{

int temp = a[k];int temp = a[k];int position = k;int position = k;

while (position > 0 && a[position-1] > temp)while (position > 0 && a[position-1] > temp){{

a[position] = a[position-1];a[position] = a[position-1];position--;position--;

}}a[position] = temp;a[position] = temp;

}}}}

Cost of Insertion SortCost of Insertion Sort Outer loopOuter loop

K = 1 to < sizeK = 1 to < size 1,2,3,41,2,3,4=> N-1=> N-1 Inner loopInner loop

Worst case: Compare against all items in list Worst case: Compare against all items in list Inserting new smallest thingInserting new smallest thing

K = 1, 1 step (position = k = 1, while position > 0)K = 1, 1 step (position = k = 1, while position > 0) K = 2, 2 steps [position = 2,1]K = 2, 2 steps [position = 2,1] K = 3, 3 steps [position = 3,2,1]K = 3, 3 steps [position = 3,2,1] K = 4, 4 steps [position = 4,3,2,1]K = 4, 4 steps [position = 4,3,2,1]

Again, worst case total comparisons is equal to Again, worst case total comparisons is equal to sum of I from 1 to N-1, which is O(Nsum of I from 1 to N-1, which is O(N22))

Cost of SwapsCost of Swaps

Selection Sort:Selection Sort:void selectionSort(int* a, int size)void selectionSort(int* a, int size){{

for (int k = 0; k < size-1; k++)for (int k = 0; k < size-1; k++){{

int index = mininumIndex(a, k, size);int index = mininumIndex(a, k, size);swap(a[k],a[index]);swap(a[k],a[index]);

}}}} One swap each time, for O(N) swapsOne swap each time, for O(N) swaps

Cost of SwapsCost of SwapsInsertion SortInsertion Sortvoid insertionSort(int* a, int size)void insertionSort(int* a, int size){{

for (int k = 1; k < size; k++)for (int k = 1; k < size; k++){{

int temp = a[k];int temp = a[k];int position = k;int position = k;

while (position > 0 && a[position-1] > temp)while (position > 0 && a[position-1] > temp){{

a[position] = a[position-1];a[position] = a[position-1];position--;position--;

}}a[position] = temp;a[position] = temp;

}}}} Do a shift almost every time do compare, so O(nDo a shift almost every time do compare, so O(n22) shifts) shifts Shifts are faster than swaps (1 step vs 3 steps)Shifts are faster than swaps (1 step vs 3 steps) Are we doing few enough of them to make up the difference?Are we doing few enough of them to make up the difference?

Another Issue - MemoryAnother Issue - Memory

Space requirements for each sort?Space requirements for each sort? All of these sorts require the space to All of these sorts require the space to

hold the array - O(N) hold the array - O(N) Require temp variable for swapsRequire temp variable for swaps Require a handful of countersRequire a handful of counters

Can all be done “in place”, so Can all be done “in place”, so equivalent in terms of memory costsequivalent in terms of memory costs

Not all sorts can be done in place Not all sorts can be done in place though!though!

Which O(nWhich O(n22) Sort to Use?) Sort to Use?

Insertion sort is the winner:Insertion sort is the winner: Worst case requires all comparisonsWorst case requires all comparisons

Most cases don’t (jump out of while loop Most cases don’t (jump out of while loop early)early)

Selection use for loops, go all the way Selection use for loops, go all the way through each timethrough each time

TradeoffsTradeoffs

Given random data, when is it more Given random data, when is it more efficient to:efficient to: Just search Just search versusversus Insertion Sort and searchInsertion Sort and search

Assume Z searchesAssume Z searches

Search on random data: Z * O(n)Search on random data: Z * O(n)

Sort and binary search: O(nSort and binary search: O(n22) + Z *log) + Z *log22nn

TradeoffsTradeoffsZ * n <= nZ * n <= n22 + (Z * log + (Z * log22n)n)Z * n – Z * logZ * n – Z * log22n <= nn <= n22

Z * (n-logZ * (n-log22n) <= nn) <= n22

Z <= nZ <= n22/(n-log/(n-log22n)n)

For large n, logFor large n, log22n is dwarfed by n in (n-n is dwarfed by n in (n-loglog22n)n)

Z <= nZ <= n22/n/nZ <= n (approximately)Z <= n (approximately)

Improving SortsImproving Sorts

Better sorting algorithms rely on divide Better sorting algorithms rely on divide and conquer (recursion)and conquer (recursion) Find an efficient technique for splitting dataFind an efficient technique for splitting data Sort the splits separatelySort the splits separately Find an efficient technique for merging the Find an efficient technique for merging the

datadata

We’ll see two examples We’ll see two examples One does most of its work splittingOne does most of its work splitting One does most of its work mergingOne does most of its work merging