Post on 20-Dec-2015
CSC 213 –Large Scale
Programming
Lecture 24:
Radix & Bucket Sorts
Today’s Goal
Discuss two new ways with which to sort data Very different than other forms of sorts Can* be a much faster method of sorting Follows simple pattern, but confusing to learn
Bucket-Sort
Uses, B, array of Sequences (e.g., buckets)Sorts Sequence, S, in two phases:
1. Remove first Entry, <v, k>, in S and add to B[k]
2. For i 0, …, B.size()-1, move entries from bucket B[i] to end of S
Bucket-Sort Example Suppose keys range from [0, 9]
7, d 1, c 3, a 7, g 3, b 7, e
1, c 3, a 3, b 7, d 7, g 7, e
Phase 1
Phase 2
0 1 2 3 4 5 6 8 9
B
1, c 7, d 7, g3, b3, a 7, e
S
7
S
Bucket-Sort Algorithm
Algorithm bucketSort(Sequence<E> S, Comparator<E> c)B new Sequence[c.getMaxKey()]
// instantiate the Sequence at each index within Bwhile S.isEmpty() do // Phase 1
entry S.removeFirst()B[c.compare(entry, null)].insertLast(entry)
for i 0 to B.length - 1 // Phase 2while B[i].isEmpty() do
entry B[i].removeFirst()S.insertLast(entry)
return S
Bucket-Sort Properties
Keys indices into array Must be non-negative integers Does not require external Comparator
Sort is stable Two entries with same key keep relative ordering Bubble-sort & Merge-sort also stable
Bucket-Sort Extensions
Extend Bucket-sort with Comparator Specify maximum number of buckets C.compare(key, null) returns index for key
For Integer keys from a – b: Comparator maps k to k – a
For Boolean keys, Comparator returns: 0 when the key is false 1 when the key is true
Bucket-Sort Extensions
Use Bucket-sort with any keys Keys must be from bounded set, D, of values
D could be U.S. states, molecular structures, To-Whack items on assassins hit list…
Comparator ranks each value in D Rank states alphabetically or by admission order Ranks used as index into bucket array, B
d-Tuples
Combination of d keys (k1, k2, …, kd) ki is “i-th dimension of the tuple”
Example: Point p = (x, y) is 2-tuple x is value of 1st dimension y is value of 2nd dimension
Lexicographic Order
Order of d-tuples defined recursively:
(x1, x2, …, xd) (y1, y2, …, yd)
x1 y1 (x1 y1 (x2, …, xd) (y2, …, yd))
Lexicographic order of 2-tuples?(3, 4) (7, 8) (3, 2) (1, 4) (4, 8)
(1, 4) (3, 2) (3, 4) (4, 8) (7, 8)
Lexicographic Sorting
Uses d calls to stable sorting algorithm Each call sorts along single dimension of tuple So must sort from smallest dimension to largest
Algorithm lexicographicSort(Sequence<E> s, Comparator<E> c, Sort<E> stableSort)
for i c.size() downto 1stableSort.sort(s, c, i)
return s
Radix-Sort
Lexicographical sort using Bucket-sort Good for tuples where each dimension made
into index in the range [0, N-1] Compare each character in two Strings Compare each bit in two Integers
Requires modification to comparator key still first parameter to compare But, dimension now passed as second parameter
Radix-Sort for Integers
Represent Integer as a tuple of bits:6210 = 1111102 0410 = 0001002
With decimal representation, need 10 buckets With binary representation, need 2 buckets
Radix-sort runs in O(bn) time b is the length of longest element in input For 32-bit integers, b = 32 Takes O(32n) time ≈ O(n) time!
Radix-Sort for IntegersAlgorithm binaryRadixSort(Sequence<E> S,
Comparator<E> c)for i 0 to c.size()
bucketSort(S, 2, i, c) return S
Value of the ith bit of Integer k is:
((k >> i) & 1)
Example Sorting a sequence of 4-bit integers
1001
0010
1101
0001
1110
0010
1110
1001
1101
0001
1001
1101
0001
0010
1110
1001
0001
0010
1101
1110
0001
0010
1001
1101
1110
For Friday…
Come with questions to review for Midterm Last chance to ask me about material Also have sort-based problems to discuss