Internal Sorting 1 -...

Post on 14-Oct-2020

5 views 0 download

Transcript of Internal Sorting 1 -...

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

1/45

Internal Sorting 1

S. Thiel1

1Department of Computer Science & Software EngineeringConcordia University

July 9, 2019

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

2/45

Outline

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

3/45

Sorting

I Our example: “A deck of cards”

I Sorted or unsorted?

I Ways and means by which we sort

I Properties to let us choose when to use which

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

4/45

Sorting

I We mostly see comparison sorts

I “Compare” elements in turn

I We determine the desired order

I smallest to biggest is a good default

I when sorted, elements to the left are ≤ elements to theright

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

5/45

Sorting Terms

I Stable

I In-place or in-situ

I swap

I diversion

I equality (duplication)

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

6/45

Sorting Three Elements

I What’s the best case?

I What’s the worst case?I What’s the average case?

I Distinct? What if duplicates allowed?

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

7/45

Sorting Three Elements

Figure: Ways to sort three elements.

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

8/45

Linear Sorts

I Not referencing Θ (n)

I In fact, generally Θ(n2)

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

9/45

The Exchange Sorts

I These are often called exchange sorts or linear sorts

I technically, insertion sort isn’t an exchange

I linear sort is not about analysis

I linear sort is about the flow through the list

I . . . technically insertion sort cheats there too

I They are Θ(n2)

sorts in the average case

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

10/45

The Sorts

I Bubble SortI Knuth identifies one redeeming use with obscure

technologyI when parralellized, it looks like parralellized sifting

I Selection SortI Sifting

I Shafer’s book calls this Insertion Sort

I (Actual) Insertion Sort

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

11/45

Bubble Sort

I This one is bad, but amusing

I Bubbles up the list

I Always has to go up till the end

I Insertion/Sifting does not

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

12/45

Bubble Sort Workings

1. The “end” position is one past the last element

2. The first element is the “biggest”

3. If the next element is the “end” position, stop.

4. Point to the next element

5. Compare “biggest” with newly pointed at

5.1 If the newer item is bigger, it is now the biggest,5.2 If the newer item is smaller, swap with the biggest item

6. Check if the next element is the “end” position

6.1 If it is the “end”, the “end” position moves left6.2 If it is not the end, Go to 4

7. Go to 2

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

13/45

Bubble Sort Analysis

I We know after each pass, one more item is in order

I we compare n-1 times, then n-2 times, etc.

I In the best case, we do no swaps, same number ofcompares

I In the worst case do we swap after every compare?

I What does the average case imply?

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

14/45

Bubble Sort Properties

I Stable (if we do ≥?)

I In-place

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

15/45

Selection Sort

I Like Bubble Sort in comparisons

I But fewer swaps

I We only swap once for each pass

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

16/45

Selection Sort Workings

I Maybe make me draw this on the board too. . .

1. The “end” position is one past the last element

2. The first element is the “biggest”

3. If the next element is the “end” position, stop.

4. Point to the next element

5. Compare “biggest” with newly pointed at

5.1 If the newer item is bigger, it is now the biggest,

6. Check if the next element is the “end” position6.1 If it is the “end”

6.1.1 swap biggest with the one pointed at6.1.2 move the “end” to the left

6.2 If it is not the end, Go to 4

7. Go to 2

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

17/45

Selection Sort Analysis

I We know after each pass, one more item is in order

I we compare n-1 times, then n-2 times, etc.

I In the best case, we do no swaps, same number ofcompares

I In the worst case we swap only once for each n

I What does the average case imply? Still half-n swaps?

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

18/45

Selection Sort Properties

I Stable (if we do ≥?)

I In-place

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

19/45

Sifting

I walk up the list

I where you are at in the list is current position

I everything before current position must be in order

I put current position in order

I advance current position

I things put in order by swapping

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

20/45

Insertion Sort

I Exactly like Sifting except

I You don’t swap, you slide

I Optimal diversion sort in most cases

I Also fast when list is nearly sorted

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

21/45

Insertion Sort A1 Example Part 1

2 3 5 8 7 4 11 2

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

22/45

Insertion Sort A1 Example Part 2

2 3 5 8 4 7 2 11

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

23/45

Insertion Sort A1 Example Part 3

Doing Insertion of 4.

2 3 4 5 8 7 2 11

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

24/45

Insertion Sort A1 Example Part 4

Doing Insertion of 7.

2 3 4 5 7 8 2 11

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

25/45

Insertion Sort A1 Example Part 5

Doing Insertion of 2.

2 2 3 4 5 7 8 11

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

26/45

Linear Algorithm Asymptotic Analysis

Figure: A table of our Big-O understanding of linear sorts [1, p.231]

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

27/45

Better than n2

I Can we sort faster than this?

I Definitely. The rest of the course looks at theseapproaches.

I Today we’ll start with Quicksort, a very popular sort

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

28/45

Shellsort

I It is worth reading about Shellsort

I average case of O(n1.5)

I makes use of the best case for Insertion Sort

I still not better than modern sort, but a neatimprovement over linear sorts

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

29/45

Quicksort

I Quicksort works by partitioning an input in two, thensorting each half recursively

I The partition is made around a chosen “pivot”

I The left partition only has elements smaller than the“pivot”

I The right partition only has elements bigger than the“pivot”

I Each “partition” step takes Θ (N) operations

I How many “partition” steps are needed?

I Actually, each “partition” step takes gradually feweroperations. . . why?

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

30/45

Quicksort flavors

I There are two ”partitioning schemes”I Hoare (my preference)

I two scanning indicesI move towards each other till they swap or crossI swap when both point at an element on the wrong side

(inversion)

I LomutoI two-indices, but only one is scanningI swap out of place items to beginningI less efficient

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

31/45

Common variants

I Median of Three

I Diversion

I Tail Recursion

I Introsort (Musser)

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

32/45

Quicksort Sort Properties 1

I Quicksort. . .

I is a divide and conquer algorithm

I works best with good pivot selection

I is recursive

I puts a pivot in place every pass

I is in-place

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

33/45

Quicksort Sort Properties 2

I Is it stable?

I Some neat optimizations to make it fast and stable withmany duplicates

I . . . might be a bit slower otherwise

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

34/45

Quicksort Analysis

I Best case Θ (n log n)

I Average case Θ (n log n)

I Worst case Θ(n2)

I Note that the Average and Worst case have differentcomplexity?

I What does that mean?

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

35/45

Quicksort With Distinct Keys Average-CaseAnalysis

I Sedgewick’s 1977 piece “Quicksort with Equal Keys”[2]

I He starts with an introductory analysis of inputs withdistinct keys

I We will look at that here

I I won’t test you directly on an analysis like this

I ...but you should know how these things happen andwhat it means!

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

36/45

Quicksort With Distinct Keys 1

I Let us look at comparisons with the pivot

I We assume random inputs

I We assume randomness is maintained on partitioning

I We assume sentinel checks give two extra comparisons(this is an optimization)

I Since input of length 1 is sorted, assume 2 ≤ N

I We can then use the recurrence relation as follows(directly from Sedgwick)

I let CN be the average comparisons given the above

I CN = N + 1 + 1N

∑1≤k≤N

(Ck−1 + CN−k)

I This looks a bit bulky, can we trim it down?

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

37/45

Quicksort With Distinct Keys 2

I given CN = N + 1 + 1N

∑1≤k≤N

(Ck−1 + CN−k)

I We see that the recurrence is just the left and the rightpartition

I Since the sum must be the total size, the occurrence ofone of them must be the same as the occurrence of itscomplement

I We can thus reduce to:

I CN = N + 1 + 2N

∑1≤k≤N

Ck−1

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

38/45

Quicksort With Distinct Keys 3

I CN = N + 1 + 2N

∑1≤k≤N

Ck−1

I We know that if the input is empty or there is one item,we have no comparisons.

I We know that the N + 1 is just the number ofcomparisons on the first partitioning pass.

I We know that the last term is the average number ofcomparisons for each half.

I If we multiply by N we lose the fraction

I NCN = N2 + N + 2∑

1≤k≤N

Ck−1

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

39/45

Quicksort With Distinct Keys 4

I NCN = N2 + N + 2∑

1≤k≤N

Ck−1

I we can further reduce by a process called differencing,that is subtracting the result of N-1 (which means weneed an N of at least size 3)

I (N − 1)CN−1 = (N − 1)2 + (N − 1) + 2∑

1≤k≤N−1

Ck−1

I Before we difference, note that the summation can bemade the same by subtracting the last term, CN−1

I (N−1)CN−1 = (N−1)2+(N−1)+2∑

1≤k≤N

Ck−1−CN−1

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

40/45

Quicksort With Distinct Keys 5

I Let’s look at the left side first, it’s easy:

I NCN − (N − 1)CN−1

I The right side looks more complicated at first:

I N2 + N + 2∑

1≤k≤N

Ck−1 − (N − 1)2 − (N − 1) −

2∑

1≤k≤N

Ck−1 + 2CN−1

I but we can note right away that because of the lasttweak, the summation terms differ only in sign, so wecan get rid of them:

I N2 + N − (N − 1)2 − (N − 1) + 2CN−1

I We can then expand the terms to get further savings

I N2 + N − N2 + 2N − 1 − N + 1 + 2CN−1

I Which reduces simply to

I 2N + 2CN−1

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

41/45

Quicksort With Distinct Keys 6

I We can now show both sides easily

I NCN − (N − 1)CN−1 = 2N + 2CN−1

I Isolating the NCN term again:

I NCN = 2N + 2CN−1 + (N − 1)CN−1

I NCN = (N + 1)CN−1 + 2N

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

42/45

Quicksort With Distinct Keys 7

I This last step requires some intuition, but we can divideby N(N + 1) to see a telescoping pattern

I CNN+1 = C2

3 +∑

3≤k≤N

2k+1

I Given the similarity to harmonic series, we can simplifyknowing that:

I Hn =∑

1≤k≤N

1k

I Personally, I see it clearer by knowing that we can pullthe constant 2 out of the summation and by adjustingthe limit we can get rid of the +1 in the denominator.

I CNN+1 = C2

3 + 2∑

4≤k≤N+1

1k

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

43/45

Quicksort With Distinct Keys 8

I Given Hn =∑

1≤k≤N

1k

I and CNN+1 = C2

3 + 2∑

4≤k≤N+1

1k

I We can look at our term in terms of the Harmonicnumber, that is

I 2(HN+1 − 1 − 12 − 1

3)

I Since C2 = 3 we can push that constant in with theharmonic (we have to divide it by two to get inside theterm:

I CN = 2(N + 1)(HN+1 + 12 − 1 − 1

2 − 13):

I or more simply CN = 2(N + 1)(HN+1 − 43):

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

44/45

Quicksort With Distinct Keys 9

I CN = 2(N + 1)(HN+1 − 43) when N >= 2

I Since the rate of growth of the Harmonic Series isΘ (log n) then the average case of CN must beΘ (N log n)

Internal Sorting 1

S. Thiel

Sorting

Linear Sorts

nlogn Sorts

Quicksort Analysis

References

45/45

References I

[1] Clifford A. Shaffer.Data Structures and Algorithm Analysis in Java.2013.

[2] Robert Sedgewick.Quicksort with equal keys.SIAM Journal on Computing, 6(2):240–267, 1977.