Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort...

44
Sorting (Sorting) Data Structures Fall 2019 1 / 43

Transcript of Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort...

Page 1: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Sorting

(Sorting) Data Structures Fall 2019 1 / 43

Page 2: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Introduction

One of the most common applications in computer science issorting, the process through which data are arranged according totheir values.If data were not ordered in some way, we would spend anincredible amount of time trying to find the correct information.To appreciate this, imagine trying to find someones number in thetelephone book if the names were not sorted in some way!

(Sorting) Data Structures Fall 2019 2 / 43

Page 3: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

General Sorting Concepts

Sorts are generally classified as either internal or external.An internal sort is a sort in which all of the data is held in primarymemory during the sorting process.An external sort uses primary memory for the data currentlybeing sorted and secondary storage for any data that will not fit inprimary memory.For example, a file of 20,000 records may be sorted using an arraythat holds only 1000 records.Therefore only 1000 records are in primary memory at any giventime.The other 19,000 records are stored in secondary storage.Comparison-based sorting: determining order by comparing pairsof elements: <, >, =, ...

(Sorting) Data Structures Fall 2019 3 / 43

Page 4: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Sort Order and Stability

Data may be sorted in either ascending or descending order.The sort order identifies the sequence of sorted data, ascending ordescending.If the order of the sort is not specified, it is assumed to beascending.Sort stability is an attribute of a sort indicating that data elementswith equal keys maintain their relative input order in the output.

(Sorting) Data Structures Fall 2019 4 / 43

Page 5: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Sorting algorithms

Comparison-based sortingbubble sort: swap adjacent pairs that are out of orderselection sort: look for the smallest element, move to frontinsertion sort: build an increasingly large sorted front portionmerge sort: recursively divide the array in half and sort itheap sort: place the values into a sorted tree structurequick sort: recursively partition array based on a middle value...

Other specialized sorting algorithms:bucket sort: cluster elements into smaller groups, sort themradix sort: sort integers by last digit, then 2nd to last, then ...

(Sorting) Data Structures Fall 2019 5 / 43

Page 6: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Selection sort

Order a list of values by repeatedly putting the smallest or largestunplaced value into its final position.

# comparisons = Σni=1(n− i) = O(n2)

# data movements = O(n)(Sorting) Data Structures Fall 2019 6 / 43

Page 7: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Bubble sort

# comparisons = Σni=1(n− i) = O(n2)

# data movements = Σni=1(n− i) = O(n2)

(Sorting) Data Structures Fall 2019 7 / 43

Page 8: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Insertion sort

Visualize the first part of the list is the sorted portion which isseparated by a c conceptual wall from the unsorted portion of the list.

# comparisons = Σni=1(n− i) = O(n2)

# data movements = Σni=1(i− 1) = O(n2)

For ”almost sorted sequence”, # comparisons/movements = O(n)

(Sorting) Data Structures Fall 2019 8 / 43

Page 9: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Shell sort

Invented by Donald Shell in 1959.1st algorithm to break the quadratic time barrier but few yearslater, a sub quadratic time bound was provenShellsort works by comparing elements that are distant ratherthan adjacent elements in an array.Shellsort uses a sequence h1, h2, ..., ht called the incrementsequence. Any increment sequence is fine as long as h1 = 1 andsome other choices are better than others.Shellsort improves on the efficiency of insertion sort by quicklyshifting values to their destination.

(Sorting) Data Structures Fall 2019 9 / 43

Page 10: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Shell sort example using sequence (1, 2, 4)

Gap=4

Gap=2

(Sorting) Data Structures Fall 2019 10 / 43

Page 11: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Shell sort example using sequence (1, 2, 4)

Gap=1

Shellsort never does more than n1.5 comparisons (for theh = 1, 4, 13, 40, ...).The analysis of this algorithm is hard. Two conjectures of thecomplexity are n(logn)2 and n1.25

(Sorting) Data Structures Fall 2019 11 / 43

Page 12: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Merge sort

Repeatedly divides the data in half, sorts each half, and combines thesorted halves into a sorted whole.

Divide the list into two roughly equal halves.Sort the left half.Sort the right half.Merge the two sorted halves into one sorted list.

- Often implemented recursively.- An example of a ”divide and conquer” algorithm. Invented by

John von Neumann in 1945

(Sorting) Data Structures Fall 2019 12 / 43

Page 13: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Merge sort example

(Sorting) Data Structures Fall 2019 13 / 43

Page 14: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Merging two sorted subarrays

(Sorting) Data Structures Fall 2019 14 / 43

Page 15: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort

Follows the divide-and-conquer paradigm.Divide: Partition (separate) the array A[p..r] into two (possiblyempty) subarrays A[p..q− 1] and A[q + 1..r].

I Each element in A[p..q− 1] < A[q].I A[q] < each element in A[q + 1..r].I Index q is computed as part of the partitioning procedure.

Conquer: Sort the two subarrays by recursive calls to quicksort.Combine: The subarrays are sorted in place no work is needed tocombine them.How do the divide and combine steps of quicksort compare withthose of merge sort?

(Sorting) Data Structures Fall 2019 15 / 43

Page 16: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort - example

(Sorting) Data Structures Fall 2019 16 / 43

Page 17: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort - example (cont’d)

(Sorting) Data Structures Fall 2019 17 / 43

Page 18: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort - best case and worst case

Best case: O(n log n)

Worst case: O(n2)

(Sorting) Data Structures Fall 2019 18 / 43

Page 19: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort - average case

T(n) =1n

n∑k=1

(

pivot at position k︷ ︸︸ ︷T(k− 1) + T(n− k)) +

partition︷ ︸︸ ︷n + 1

T(n) =2n

n−1∑k=0

T(k) + n + 1

nT(n) = 2n−2∑k=0

T(k) + (n + 1)n

(n− 1)T(n− 1) = 2n−1∑k=0

T(k) + n(n− 1)

nT(n)− (n− 1)T(n− 1) = 2T(n− 1) + 2n

nT(n) = (n− 1)T(n− 1) + 2T(n− 1) + 2n = (n + 1)T(n− 1) + 2n

(Sorting) Data Structures Fall 2019 19 / 43

Page 20: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort - average case

T(n)

n + 1=

T(n− 1)

n+

2n + 1

=T(n− 2)

n− 1+

2n

+2

n + 1

=T(n− 3)

n− 2+

2n− 1

+2n

+2

n + 1...

=T(n− k)

n− k + 1+ 2(

1n− k + 2

+ ... +1n

+1

n + 1)

Note: ∑ 1k≤

∫dkk≤ O(log n)

Hence, T(n) ≤ O(n log n)

(Sorting) Data Structures Fall 2019 20 / 43

Page 21: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort - implementation

Almost anything you can try to iimprove Quicksort will actuallyslow it downOne good tweak is to switch to a different sorting method whenthe subarrays get small (say, 10 or 12) – to avoid too muchoverhead for small array sizes

Randomized QuicksortSelect the pivot as a random element of the sequence.The expected running time of randomized quick-sort on asequence of size n is O(n log n).

Median of threeCompare just three elements of our (sub)arraythe first, the last,and the middle Take the median (middle value) of these three aspivotMedian of three is a good technique for choosing the pivot.

(Sorting) Data Structures Fall 2019 21 / 43

Page 22: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Time complexity of Sorting

Several sorting algorithms have been discussed and the best ones,so far:

I Heap sort and Merge sort: O(n log n)I Quick sort (best one in practice): O(n log n) on average, O(n2) worst

caseCan we do better than O(nlogn)?

I No.I It can be proven that any comparison-based sorting algorithm will

need to carry out at least O(n log n) operations

(Sorting) Data Structures Fall 2019 22 / 43

Page 23: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Restrictions on the problem

Suppose the values in the list to be sorted can repeat but thevalues have a limit (e.g., values are digits from 0 to 9)Sorting, in this case, appears easierIs it possible to come up with an algorithm better than O(n log n)?

I YesI Strategy will not involve comparisons

(Sorting) Data Structures Fall 2019 23 / 43

Page 24: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Bucket sort

Idea: suppose the values are in the range 0..m− 1; start with mempty buckets numbered 0 to m− 1, scan the list and placeelement s[i] in bucket s[i], and then output the buckets in orderWill need an array of buckets, and the values in the list to besorted will be the indexes to the buckets

Time complexity is O(n + m)(Sorting) Data Structures Fall 2019 24 / 43

Page 25: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Straight Radix Sort

If your integers are in a larger range then do bucket sort on eachdigitStart by sorting with the low-order digit using a STABLE bucketsort.Then, do the next-lowest,and so on

Time complexity = O(bn)

(Sorting) Data Structures Fall 2019 25 / 43

Page 26: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Straight Radix Sort - example

(Sorting) Data Structures Fall 2019 26 / 43

Page 27: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Straight Radix Sort - example (cont’d)

(Sorting) Data Structures Fall 2019 27 / 43

Page 28: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Straight Radix Sort - correctness

We show that any twokeys are in the correctrelative order at the endof the algorithmGiven two keys, let k bethe leftmost bit-positionwhere they differAt step k the two keysare put in the correctrelative order Because ofstability, the successivesteps do not change therelative order of the twokeys

(Sorting) Data Structures Fall 2019 28 / 43

Page 29: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Radix Exchange Sort

(Sorting) Data Structures Fall 2019 29 / 43

Page 30: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Radix Exchange Sort (cont’d)

Time complexity = O(bn)

(Sorting) Data Structures Fall 2019 30 / 43

Page 31: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Radix Exchange Sort vs. Quicksort

SimilaritiesI both partition arrayI both recursively sort sub-arrays

DifferencesI Method of partitioning

F radix exchange divides array based on greater than or less than 2b−1

F quicksort partitions based on greater than or less than some elementof the array

I Time complexityF Radix exchange O(bn)F Quicksort average case O(n log n)

(Sorting) Data Structures Fall 2019 31 / 43

Page 32: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Lower bound on comparison sorting – decision tree

We may describe the behavior of a comparison-based sortingalgorithm S on an input array A = 〈A[1], ...,A[n]〉 by a decision tree:

At each leaf of the tree the output of the algorithm on thecorresponding execution branch will be displayed. Outputs of sortingalgorithms correspond to permutations of the input array.

(Sorting) Data Structures Fall 2019 32 / 43

Page 33: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Decision tree - example

Insertion sort for n = 3

In insertion sort, when we get the result of a comparison, we oftenswap some elements of the array. In showing decision trees, we dontimplement a swap. Our indices always refer to the original elements atthat position in the array. To understand what I mean, draw theevolving array of Insertion Sort beside this decision tree.

(Sorting) Data Structures Fall 2019 33 / 43

Page 34: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Size of Decision Tree

When dealing with n elements we have n! possible arrangements andneed a decision tree with at least blog n!c levels.

(Sorting) Data Structures Fall 2019 34 / 43

Page 35: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Characteristic Diagrams

(Sorting) Data Structures Fall 2019 35 / 43

Page 36: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Selection Sorting a Random Permutation

(Sorting) Data Structures Fall 2019 36 / 43

Page 37: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Insertion Sorting a Random Permutation

(Sorting) Data Structures Fall 2019 37 / 43

Page 38: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Shell Sorting a Random Permutation

(Sorting) Data Structures Fall 2019 38 / 43

Page 39: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Merge Sorting a Random Permutation

(Sorting) Data Structures Fall 2019 39 / 43

Page 40: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Straight Radix Sort

(Sorting) Data Structures Fall 2019 40 / 43

Page 41: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Quicksort

(Sorting) Data Structures Fall 2019 41 / 43

Page 42: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Heapsorting a Random Permutation: Construction

(Sorting) Data Structures Fall 2019 42 / 43

Page 43: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Heapsorting (Sorting Phase)

(Sorting) Data Structures Fall 2019 43 / 43

Page 44: Sortingccf.ee.ntu.edu.tw/~yen/courses/ds19F/chapter-sort.pdf · 2019-12-07 · Straight Radix Sort If your integers are in a larger range then do bucket sort on each digit Start by

Bubble Sorting a Random Permutation

(Sorting) Data Structures Fall 2019 44 / 43