CSC 213 – Large Scale Programming. Today’s Goals Review discussion of merge sort and quick sort...

30
LECTURE 26: BUCKET SORT & RADIX SORT CSC 213 – Large Scale Programming

Transcript of CSC 213 – Large Scale Programming. Today’s Goals Review discussion of merge sort and quick sort...

Page 1: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

LECTURE 26:BUCKET SORT & RADIX SORT

CSC 213 – Large Scale Programming

Page 2: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Today’s Goals

Review discussion of merge sort and quick sort How do they work & why divide-and-

conquer? Are they fastest possible sorts?

Another way to sort data presented How can we sort data with single simple

value? What are limits on using buckets to sort our

data? If we want more buckets, can we expand

these limits? How does radix sort work? How long does it

need?

Page 3: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Quick Sort v. Merge Sort

Quick Sort Merge Sort

Divide data around pivot Want pivot to be near

middle All comparisons occur

here

Conquer with recursion Does not need extra

space

Merge usually done already Data already sorted!

Divide data in blindly half Always gets even split No comparisons

performed!

Conquer with recursion Needs* to use other

arrays

Merge combines solutions Compares from (sorted)

halves

Page 4: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Complexity of Sorting

With n! external nodes, binary tree’s height is:minimum height (time)

log (n!)

n!

xi < xj ?

xa < xb ?

xc < xd ? xc < xd ?xc < xd ? xc < xd ?

xa < xb ?O(n log n)

Page 5: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket-Sort

Buckets, B, is array of Sequence Sorts Collection, C, in two phases:

1. Remove each element v from C & add to B[v]

2. Move elements from each bucket back to C

A B C

Page 6: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket-Sort

Buckets, B, is array of Sequence Sorts Collection, C, in two phases:

1. Remove each element v from C & add to B[v]

2. Move elements from each bucket back to C

Page 7: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket-Sort Algorithm

Algorithm bucketSort(Sequence<Integer> C)B = new Sequence[10] // & instantiate each Sequence

// Phase 1 for each element v in C

B[v].addLast(v) // Assumes each number in C between 0 & 9endfor

// Phase 2loc = 0for each Sequence b in B

for each element v in bC.set(loc, v)loc += 1

endforendfor

return C

Page 8: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integer indices needed to

access arrays Sorting occurs without comparing objects

Page 9: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integer indices needed to

access arrays Sorting occurs without comparing

objects

Page 10: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integer indices needed to

access arrays

Sorting occurs without

comparing objects

Page 11: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Properties

For this to work, values must be legal indices Non-negative integer indices needed to

access arrays Sorting occurs without comparing objects

Stable sort describes any sort of this type Preserves relative ordering of objects with

same value (BUBBLE-SORT & MERGE-SORT are other

stable sorts)

Page 12: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Extensions

Use Comparator for BUCKET-SORT Get index for v using compare(v, null)

Comparator for booleans could return 0 when v is false 1 when v is true

Comparator for US states, could return Annual per capita consumption of Jello Consumption of jello overall, in cubic feet State’s ranking by population

Page 13: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Extensions

State’s ranking by population

1 California2 Texas3 New York4 Florida5 Illinois

6Pennsylvania

7 Ohio8 Michigan9 Georgia

Page 14: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Extensions

Extended BUCKET-SORT works with many types Limited set of data needed for this to work Need way to enumerate values of the set

Page 15: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Bucket Sort Extensions

Extended BUCKET-SORT works with many types Limited set of data needed for this to work Need way to enumerate values of the set

enumerateis subtle

hint

Page 16: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

d-Tuples

Combination of d values such as (k1, k2, …, kd) ki is ith dimension of the tuple

A point (x, y, z) is 3-tuple x is 1st dimension’s value Value of 2nd dimension is y z is 3rd dimension’s value

Page 17: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Lexicographic Order

Assume a & b are both d-tuples a = (a1, a2, …, ad)

b = (b1, b2, …, bd)

Can say a < b if and only if a1 < b1 OR

a1 = b1 && (a2, …, ad) < (b2, …, bd)

Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8)

Page 18: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Lexicographic Order

Assume a & b are both d-tuples a = (a1, a2, …, ad)

b = (b1, b2, …, bd)

Can say a < b if and only if a1 < b1 OR

a1 = b1 && (a2, …, ad) < (b2, …, bd)

Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8) (1 4) (3 2) (3 4) (4 8) (7 8)

Page 19: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort

Very fast sort for data expressed as d-tuple Cheats to win; faster than sorting’s lower

bound Sort performed using d calls to bucket sort Sorts least to most important dimension of

tuple Luckily lots of data are d-tuples

String is d-tuple of char“L E T T E R S”“L I N G E R S”

Page 20: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort

Very fast sort for data expressed as d-tuple Cheats to win; faster than sorting’s lower

bound Sort performed using d calls to bucket sort Sorts least to most important dimension of

tuple Luckily lots of data are d-tuples

Digits of an int can be used for sorting, also

1 0 0 1 3 7 2 91 0 0 9 2 2 1 0

Page 21: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort For Integers

Represent int as a d-tuple of digits:621010 = 1111102 041010 =

0001002

Decimal digits needs 10 buckets to use for sorting

Ordering using their bits needs 2 buckets O(d∙n) time needed to run RADIX-SORT

d is length of longest element in input In most cases value of d is constant (d =

31 for int) Radix sort takes O(n) time, ignoring

constant

Page 22: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT1001

0010

1101

0001

1110

Page 23: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT1001

0010

1101

0001

1110

0010

1110

1001

1101

0001

Page 24: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT1001

0010

1101

0001

1110

1001

1101

0001

0010

1110

0010

1110

1001

1101

0001

Page 25: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT1001

0010

1101

0001

1110

1001

0001

0010

1101

1110

1001

1101

0001

0010

1110

0010

1110

1001

1101

0001

Page 26: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort In Action

List of 4-bit integers sorted using RADIX-SORT 0001

0010

1001

1101

1110

1001

0010

1101

0001

1110

1001

0001

0010

1101

1110

1001

1101

0001

0010

1110

0010

1110

1001

1101

0001

Page 27: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort

Algorithm radixSort(Sequence<Integer> C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bitendfor

return C

What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice Loop repeats once per digit to complete

sort

Page 28: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort

Algorithm radixSort(Sequence<Integer> C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bitendfor

return C

What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice

O(n) Loop repeats once per digit to complete

sort * O(1)

O(n)

Page 29: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

Radix-Sort

Algorithm radixSort(Sequence<Integer> C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bitendfor

return C

What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice

O(n) Loop repeats once per digit to complete

sort * O(1)

O(log n) times (?) O(n log n)

Page 30: CSC 213 – Large Scale Programming. Today’s Goals  Review discussion of merge sort and quick sort  How do they work & why divide-and-conquer?  Are they.

For Next Lecture

Start thinking test cases for program #2 Friday is next deadline when these must be

submitted Spend time on this: tests & design saves

coding Next weekly assignment available

tomorrow As is usual, this will be due next Tuesday

Reading on Graph ADT for Wednesday Note: these have nothing to do with bar

charts What are mathematical graphs? Why are they the basis of everything in CS?