Copyright (C) Gal Kaminka 2003 1 Data Structures and Algorithms Sorting II: Divide and Conquer...

36
Copyright (C) Gal Kaminka 2 003 1 Data Structures and Algorithms Sorting II: Divide and Conquer Sorting Gal A. Kaminka Computer Science Department

Transcript of Copyright (C) Gal Kaminka 2003 1 Data Structures and Algorithms Sorting II: Divide and Conquer...

Copyright (C) Gal Kaminka 2003 1

Data Structures and Algorithms

Sorting II:

Divide and Conquer Sorting

Gal A. Kaminka

Computer Science Department

2

Last week: in-place sorting

Bubble Sort – O(n2) comparisons O(n) best case comparisons, O(n2) exchanges

Selection Sort - O(n2) comparisons O(n2) best case comparisons O(n) exchanges (always)

Insertion Sort – O(n2) comparisons O(n) best case comparisons Fewer exchanges than bubble sort Best in practice for small lists (<30)

3

This week

Mergesort O(n log n) always O(n) storage

Quick sort O(n log n) average, O(n^2) worst Good in practice (>30), O(log n) storage

4

MergeSort A divide-and-conquer technique Each unsorted collection is split into 2

Then again Then again

Then again

……. Until we have collections of size 1 Now we merge sorted collections

Then again Then again

Then again Until we merge the two halves

5

MergeSort(array a, indexes low, high)1. If (low < high)

2. middle(low + high)/2

3. MergeSort(a,low,middle) // split 1

4. MergeSort(a,middle+1,high) // split 2

5. Merge(a,low,middle,high) // merge 1+2

6

Merge(arrays a, index low, mid, high)1. bempty array, tmid+1, ilow, tllow

2. while (tl<=mid AND t<=high)

3. if (a[tl]<=a[t])

4. b[i]a[tl]

5. ii+1, tltl+1

6. else

7. b[i]a[t]

8. ii+1, tt+1

9. if tl<=mid copy a[tl…mid] into b[i…]

10. else if t<=high copy a[t…high] into b[i…]

11. copy b[low…high] onto a[low…high]

7

An example

Initial: 25 57 48 37 12 92 86 33

Split: 25 57 48 37 12 92 86 33

Split: 25 57 48 37 12 92 86 33

Split: 25 57 48 37 12 92 86 33

Merge: 25 57 37 48 12 92 33 86

Merge: 25 37 48 57 12 33 86 92

Merge: 12 25 33 37 48 57 86 92

8

The complexity of MergeSort

Every split, we half the collection How many times can this be done?

We are looking for x, where 2x = n

x = log2 n

So there are a total of log n splits

9

The complexity of MergeSort

Each merge is of what run-time? First merge step: n/2 merges of 2 n Second merge step: n/4 merges of 4 n Third merge step: n/8 merges of 8 n …. How many merge steps? Same as splits log n

Total: n log n steps

10

Storage complexity of MergeSort

Every merge, we need to hold the merged array:

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4

1 2 3 4 5 6

11

Storage complexity of MergeSort

So we need temporary storage for merging Which is the same size as the two collections together

To merge the last two sub-arrays (each size n/2)

We need n/2+n/2 = n temporary storage

Total: O(n) storage

12

MergeSort summary

O(n log n) runtime (best and worst) O(n) storage (not in-place) Very naturally done using recursion

But note can be done without recursion!

In practice: Can be improved by combining with insertion sort Split down to arrays of size 20-30, then insert-sort Then merge

13

QuickSort

Key idea: Select a item (called the pivot) Put it into its proper FINAL position Make sure:

All greater item are on one side (side 1) All smaller item are on other side (side 2)

Repeat for side 1 Repeat for side 2

14

Short example

25 57 48 37 12 92 86 33 Let’s select 25 as our initial pivot. We move items such that:

All left of 25 are smaller All right of 25 are larger As a result 25 is now in its final position

12 25 57 48 37 92 86 33

15

Now, repeat (recursively) for left and right sides

12 25 57 48 37 92 86 33 Sort 12 Sort 57 48 37 92 86 33

12 needs no sorting For the other side, we repeat the process

Select a pivot item (let’s take 57) Move items around such that left items are smaller,

etc.

16

12 25 57 48 37 92 86 33

Changes into

12 25 48 37 33 57 92 86

And now we repeat the process for left

12 25 37 33 48 57 92 86

12 25 33 37 48 57 92 86

12 25 33 37 48 57 92 86

And for the right

12 25 33 37 48 57 86 92

12 25 33 37 48 57 86 92

17

QuickSort(array a; index low, hi)

1. if (low >= hi)

2. return ; // a[low..hi] is sorted

3. pivotfind_pivot(a,low,hi)

4. p_index=partition(a,low,high,pivot)

5. QuickSort(a,low,p_index-1)

6. QuickSort(a,p_index+1,hi)

18

Key questions

How do we select an item (FindPivot())? If we always select the largest item as the pivot

Then this process becomes Selection Sort Which is O(n2)

So this works only if we select items “in the middle” Since then we will have log n divisions

How do we move items around efficiently (Partition()?) This offsets the benefit of partitioning

19

FindPivot

To find a real median (middle item) takes O(n) In practice however, we want this to be O(1) So we approximate:

Take the first item (a[low]) as the pivot Take the median of {a[low],a[hi],a[(low+hi)/2]}

FindPivot(array a; index low, high)

1. return a[low]

20

Partition (in O(n))

Key idea: Keep two indexes into the array

up points at lowest item >= pivot down points at highest item <= pivot

We move up, down in the array Whenever they point inconsistently, interchange

At end: up and down meet in location of pivot

21

partition(array a; index low,hi ; pivot; index pivot_i)

1. downlow, uphi

2. while(down<up)

3. while (a[down]<=pivot && down<hi)

4. downdown + 1

5. while (a[hi]>pivot)

6. upup – 1

7. if (down < up)

8. swap(a[down],a[up])

9. a[pivot_i]=a[up]

10. a[up] = pivot

11. return up

22

Example: partition() with pivot=25

First pass through loop on line 2:

25 57 48 37 12 92 86 33

down up

23

Example: partition() with pivot=25

First pass through loop on line 2:

25 57 48 37 12 92 86 33

down up

We go into loop in line 3 (while a[down]<=pivot)

24

Example: partition() with pivot=25

First pass through loop on line 2:

25 57 48 37 12 92 86 33

down up

We go into loop in line 5 (while a[up]>pivot)

25

Example: partition() with pivot=25

First pass through loop on line 2:

25 57 48 37 12 92 86 33

down up

We go into loop in line 5 (while a[up]>pivot)

26

Example: partition() with pivot=25

First pass through loop on line 2:

25 57 48 37 12 92 86 33

down up

Now we found an inconsistency!

27

Example: partition() with pivot=25

First pass through loop on line 2:

25 12 48 37 57 92 86 33

down up

So we swap a[down] with a[up]

28

Example: partition() with pivot=25

Second pass through loop on line 2:

25 12 48 37 57 92 86 33

down up

29

Example: partition() with pivot=25

Second pass through loop on line 2:

25 12 48 37 57 92 86 33

down up

Move down again (increasing) – loop on line 3

30

Example: partition() with pivot=25

Second pass through loop on line 2:

25 12 48 37 57 92 86 33

down up

Now we begin to move up again – loop on line 5

31

Example: partition() with pivot=25

Second pass through loop on line 2:

25 12 48 37 57 92 86 33

down up

Again – loop on line 5

32

Example: partition() with pivot=25

Second pass through loop on line 2:

25 12 48 37 57 92 86 33

down up

down < up? No. So we don’t swap.

33

Example: partition() with pivot=25

Second pass through loop on line 2:

25 12 48 37 57 92 86 33

down up

Instead, we are done. Just put pivot in place.

34

Example: partition() with pivot=25

Second pass through loop on line 2:

12 25 48 37 57 92 86 33

down up

Instead, we are done. Just put pivot in place.

(swap it with a[up] – for us a[low] was the pivot)

35

Example: partition() with pivot=25

Second pass through loop on line 2:

12 25 48 37 57 92 86 33

down up

Now we return 2 as the new pivot index

36

Notes We need the initial pivot_index in partition() For instance, change FindPivot():

return pivot (a[low]), as well as initial pivot_index (low) Then use pivot_index in the final swap

QuickSort: Average O(n log n), Worst case O(n2)

works very well in practice (collections >30) Average O(n log n), Worst case O(n2) Space requirements O(log n) – for recursion