Chapter 7 Sorting

102
Chapter 7 Sorting Instructors: C. Y. Tang and J. S. Roger Jang All the material are integrated from the textbook "Fundamental s of Data Structures in C" and some supplement from the slide s of Prof. Hsin-Hsi Chen (NTU).

description

Chapter 7 Sorting. Instructors: C. Y. Tang and J. S. Roger Jang. All the material are integrated from the textbook "Fundamentals of Data Structures in C" and  some supplement from the slides of Prof. Hsin-Hsi Chen (NTU). Sequential Search. /* maximum size of list plus one */ - PowerPoint PPT Presentation

Transcript of Chapter 7 Sorting

Page 1: Chapter 7 Sorting

Chapter 7 Sorting

Instructors:

C. Y. Tang and J. S. Roger JangAll the material are integrated from the textbook "Fundamentals of Data Structures in

C" and  some supplement from the slides of Prof. Hsin-Hsi Chen (NTU).

Page 2: Chapter 7 Sorting

Sequential Search

/* maximum size of list plus one */# define MAX_SIZE 1000

typedef struct {

int key;/* other fields */

} element;

element list[MAX_SIZE];

Page 3: Chapter 7 Sorting

Sequential SearchInt seqsearch(int list[], int searchnum, int n){

/* search an array, list, that has n numbers.Return i, if list[i] = searchnum. Return -1, if searchnum is not in the list */int i; // sentinel that signals the end of the listlist[n].key = searchnum; for (i = 0; list[i].key != searchnum; i++)

;return ((i < n) ? i : -1);

}

Page 4: Chapter 7 Sorting

Analysis of Seq Search Example

44, 55, 12, 42, 94, 18, 06, 67 unsuccessful search

n+1 The average number of comparisons for a successful search

is

( ) /i nn

i

n

11

20

1

Page 5: Chapter 7 Sorting

Binary Search Input : sorted list

(i.e. list[0].key list[1].key ... list[n-1].key)≦ ≦ ≦ compare searchnum and list[middle].key, where

middle=(n-1)/2, There are three possible outcomes: searchnum < list[middle].key

search list[0] and list[middle-1] searchnum = list[middle].key

search terminal successfully searchnum > list[middle].key

search list[middle+1] and list[n-1]

Page 6: Chapter 7 Sorting

Binary Search Input : 4,15,17,26,30,46,48,56,58,82,90,95

It is easily to see that binary search makes no more than O(log n) comparisons (i.e. height of tree ).

46

17

4

15

26

30

58

48

56

90

82 95

[5]

[2]

[0]

[1]

[3]

[4]

[8]

[6]

[7]

[10]

[9] [11]

Decision tree for binary search

Page 7: Chapter 7 Sorting

int binsearch(element list[], int searchnum, int n){

int left = 0, right = n-1, middle;while (left <= right){

middle = (left + right) / 2;switch (COMPARE(list[middle].key, searchnum)){

case -1 : left = middle + i; break;

case 0 : return middle;case 1 : right = middle - 1;

}}return -1;

}

Binary Search

Page 8: Chapter 7 Sorting

Compare lists to verify that are identical, or to identify the discrepancies. The problem of list verification is an instance of repeatedly searching o

ne list, using each key in the other list as the search key. example

international revenue service (e.g., employee vs. employer) complexities

random order: O(mn) ordered list: O(tsort(n)+tsort(m)+m+n)

List Verification

Page 9: Chapter 7 Sorting

/* compare two unordered lists list1 and list2 */void verify1(element list1[], element list2[ ], int n, int m){ int i, j; int marked[MAX_SIZE]; for(i = 0; i<m; i++) marked[i] = FALSE; for (i=0; i<n; i++) if ((j = seqsearch(list2, m, list1[i].key))<0) printf("%d is not in list 2\n", list1[i].key); else marked[j] = TRUE; for ( i=0; i<m; i++) if (!marked[i]) printf("%d is not in list1\n", list2[i]key);}

Verifying using a sequential search

Page 10: Chapter 7 Sorting

/* Same task as verify1, but list1 and list2 are sorted */void verify2(element list1[ ], element list2 [ ], int n, int m) { int i, j; sort(list1, n); sort(list2, m); i = j = 0; while (i< n && j< m) if (list1[i].key<list2[j].key) { printf ("%d is not in list 2 \n", list1[i].key); i++; }

Fast verification of two lists

Page 11: Chapter 7 Sorting

else if (list1[i].key == list2[j].key) { i++; j++; } else { printf("%d is not in list 1\n", list2[j].key); j++; } for(; i<n; i++) printf ("%d is not in list 2\n", list1[i].key); for(; j<m; j++) printf("%d is not in list 1\n", list2[j].key);}

Fast verification of two lists

Page 12: Chapter 7 Sorting

complexities verify1 ︰

O(mn) (randomly arranged) verify2 ︰

O(tsort(n) + tsort(m) + m + n) (sorted list)

Page 13: Chapter 7 Sorting

Sorting Problem Definitions We are given a list of records (R0, R1, ... , Rn-1).

each record, Ri, has a key value, Ki. Find a permutation σ such that ︰

sorted Kσ(i-1) K≦ σ(i), for 0 < i n-1.≦

stable If i<j and Ki=Kj in the list, then Ri precedes Rj in the sorted

list. criteria

# of key comparisons # of data movements

Page 14: Chapter 7 Sorting

Insertion sort

Insert a record Ri into a sequence of ordered records, R0,R1,...,Ri-1.(K0 K≦ 1 ... K≦ ≦ i-1).

Time Complexity : O(n2) Stable The fastest sort when n is small (n 20)≦

Page 15: Chapter 7 Sorting

Insertion sort

26 5 77 1 61 11 59 15 48 19

5 26 77 1 61 11 59 15 48 19

5 26 77 1 61 11 59 15 48 19

1 5 26 77 61 11 59 15 48 19

1 5 26 61 77 11 59 15 48 19

1 5 11 26 61 77 59 15 48 19

1 5 11 26 59 61 77 15 48 19

1 5 11 15 26 59 61 77 48 19

1 5 11 15 26 48 59 61 77 19

5 26 1 61 11 59 15 48 19 77

Page 16: Chapter 7 Sorting

Insertion sort/* perform a insertion sort on the list */void insertion_sort(element list[], int n){ int i, j; element next; for (i=1; i<n; i++) { next= list[i]; for (j=i-1; j>=0 && next.key<list[j].key ; j--) list[j+1] = list[j]; list[j+1] = next; }}

Page 17: Chapter 7 Sorting

Insertion sort

worse case

i 0 1 2 3 4- 5 4 3 2 11 4 5 3 2 12 3 4 5 2 13 2 3 4 5 14 1 2 3 4 5

)()(1

0

2

n

i

nOiO

Page 18: Chapter 7 Sorting

Insertion sort

best case

i 0 1 2 3 4- 2 3 4 5 11 2 3 4 5 12 2 3 4 5 1 3 2 3 4 5 14 1 2 3 4 5

O(n)

left out of order (LOO)

Page 19: Chapter 7 Sorting

Insertion sort

Ri is LOO if Ri < max{Rj}0j<i

k: # of records LOO

Computing time: O((k+1)n)

44 55 12 42 94 18 06 67* * * * *

Page 20: Chapter 7 Sorting

Variation

Binary insertion sort sequential search --> binary search reduce # of comparisons,

# of moves unchanged List insertion sort

array --> linked list sequential search, move --> 0

Page 21: Chapter 7 Sorting

Quick Sort (C.A.R. Hoare)

Given (R0, R1, ... , Rn-1), pick up a pivot key Ki, if Ki is placed in position s(i), then ︰Kj K≦ s(i) for j<s(i)

Kj K≧ s(i) for j>s(i)

R0, …, RS(i)-1, RS(i), RS(i)+1, …, RS(n-1)

two partitions

Page 22: Chapter 7 Sorting

Quick sort example Given 10 records with keys (26, 5, 37, 1, 61, 11, 59, 15, 48, 19)

Page 23: Chapter 7 Sorting

Quick sortvoid quicksort(element list[], int left, int right){ int pivot, i, j; element temp; if (left < right) { i = left; j = right+1; pivot = list[left].key; do { do i++; while (list[i].key < pivot); do j--; while (list[j].key > pivot); if (i < j) SWAP(list[i], list[j], temp); } while (i < j); SWAP(list[left], list[j], temp); quicksort(list, left, j-1); quicksort(list, j+1, right); }}

Page 24: Chapter 7 Sorting

Analysis for quick sort Assume that each time a record is positioned, th

e list is divided into the rough same size of two parts.

Position a list with n element needs O(n) T(n) is the time taken to sort n elements

T(n)<=cn+2T(n/2) for some c <=cn+2(cn/2+2T(n/4)) ... <=cnlog n+nT(1)=O(nlog n)

Page 25: Chapter 7 Sorting

Analysis for quick sort Time complexity:

Worst case: O(n2) Best case: O(nlogn) Average case: O(nlogn)

Space complexity Worst case: O(n) Best case: O(logn) Average case: O(logn)

Unstable

Page 26: Chapter 7 Sorting

Variation Our version of quick sort always picked the key

of the first record in the current sublist as the pivot.

A better choice for this pivot is the median of the first, middle, and last keys in the current sublist.

Page 27: Chapter 7 Sorting

Optimal sorting time How quickly can we hope to sort a list of n object? If we restrict our question to algorithms that per

mit only the comparison and interchange of keys, then the answer is O(n log2n)

Page 28: Chapter 7 Sorting

Decision tree for insertion sort

K0 K≦ 1 [0,1,2]

K0 K≦ 2 [1,0,2]

K0 K≦ 2 [0,1,2]

stop [0,1,2]

stop [1,0,2]

stop [0,1,2]

stop [0,1,2]

stop [2,1,0]stop [1,2,0]

K1 K≦ 2 [0,1,2]

K1 K≦ 2 [1,2,0]

Yes

Yes

Yes

Yes

Yes

No

No No

No No

Page 29: Chapter 7 Sorting

Optimal sorting time Theorem 7.1: Any decision tree that sorts n distin

ct elements has a height of at least log2(n!) +1 Corollary: Any algorithm that sorts by comparison

s only must have a worst case computing time ofΩ(n log2n).

Page 30: Chapter 7 Sorting

Merge sort How to merge two sorted lists (list[i], ... , list[m] and

list[m+1], ... , list[n]) into single sorted list, (sorted[i], ... , sorted[n])?

There are two algorithm to do this, one need O(n) space complexity, another only require O(1).

Page 31: Chapter 7 Sorting

Merge sort (O(n) space)

void merge(element list[], element sorted[], int i, int m, int n){ int j, k, t; j = m+1; k = i; while (i<=m && j<=n) { if (list[i].key<=list[j].key) sorted[k++]= list[i++]; else sorted[k++]= list[j++]; } if (i>m) for (t=j; t<=n; t++) sorted[k+t-j]= list[t]; else for (t=i; t<=m; t++) sorted[k+t-i] = list[t];}

Page 32: Chapter 7 Sorting

Analysis

Time complexity: O(n) Space complexity: O(n)

Page 33: Chapter 7 Sorting

*Figure 7.5:First eight lines for O(1) space merge example (p337)

file 1 (sorted) file 2 (sorted)

discover n keys

exchange

sortexchange

Sort by the rightmost records of n -1 blocks

compareexchange

compareexchange

compare

preprocessing

Page 34: Chapter 7 Sorting

0 1 2 y w z u x 4 6 8 a v 3 5 7 9 b c e g i j k d f h o p q l m n r s t

0 1 2 3 w z u x 4 6 8 a v y 5 7 9 b c e g i j k d f h o p q l m n r s t

0 1 2 3 4 z u x w 6 8 a v y 5 7 9 b c e g i j k d f h o p q l m n r s t

0 1 2 3 4 5 u x w 6 8 a v y z 7 9 b c e g i j k d f h o p q l m n r s t

Page 35: Chapter 7 Sorting

*Figure 7.6:Last eight lines for O(1) space merge example(p.338)

6, 7, 8 are merged

Segment one is merged (i.e., 0, 2, 4, 6, 8, a)

Change place marker (longest sorted sequence of records)

Segment one is merged (i.e., b, c, e, g, i, j, k)

Change place marker

Segment one is merged (i.e., o, p, q)

No other segment. Sort the largest keys.

Page 36: Chapter 7 Sorting

O(1) Space merge sort

Page 37: Chapter 7 Sorting

Iterative Merge Sort

We assume that the input sequence has n sorted lists each of length 1.

We merge these lists pairwise to obtain n/2 list of size 2.

We then merge the n/2 lists pairwise, and so on, until a single list remains.

Page 38: Chapter 7 Sorting

Iterative Merge Sort example

Page 39: Chapter 7 Sorting

merge_pass function

void merge_pass(element list[], element sorted[],int n, int length)

{

int i, j;

for (i=0; i<n-2*length; i+=2*length)

merge(list,sorted,i,i+length-1,i+2*lenght-1);

if (i+length<n)

merge(list, sorted, i, i+lenght-1, n-1);

else

for (j=i; j<n; j++)

sorted[j]= list[j];

}

...i i+length-1 i+2length-1

...2*length

One complement segment and one partial segment

Only one segment

Page 40: Chapter 7 Sorting

merge_sort function

void merge_sort(element list[], int n)

{

int length=1;

element extra[MAX_SIZE];

while (length<n) {

merge_pass(list, extra, n, length);

length *= 2;

merge_pass(extra, list, n, length);

length *= 2;

}

} l l l l ...

2l 2l ...

4l ...

Page 41: Chapter 7 Sorting

26 5 77 1 61 11 59 15 48 19

5 26 11 59

5 26 77 1 61 11 15 59 19 48

1 5 26 61 77 11 15 19 48 59

1 5 11 15 19 26 48 59 61 77

(1+10)/2

(1+5)/2 (6+10)/2

copy copy copy copy

Data Structure: array (copy subfiles) vs. linked list (no copy)

Recursive Formulation of Merge Sort

Page 42: Chapter 7 Sorting

i R0 R1 R2 R3 R4 R5 R6 R7 R8 R9

key 26 5 77 1 61 11 59 15 48 19

link 8 5 -1 1 2 7 4 9 6 0

start=3

Simulation of merge sort

Page 43: Chapter 7 Sorting

int rmerge(element list[], int lower, int upper)

{int middle;if (lower >= upper) return lower;else {

middle = (lower+upper)/2;return listmerge(list,

rmerge(list,lower,middle), rmerge(list,middle, upper)); }}

lower upper

Point to the start of sorted chain

lower uppermiddle

Recursive Merge Sort

Page 44: Chapter 7 Sorting

List Merge

int listmerge(element list[], int first, int second){ int start=n; while (first!=-1 && second!=-1) { if (list[first].key<=list[second].key) { /* key in first list is lower, link this element to start and

change start to point to first */ list[start].link= first; start = first; first = list[first].link; }

first

second...

Page 45: Chapter 7 Sorting

List Merge

else { /* key in second list is lower, link this element into the partially sorted list */ list[start].link = second; start = second; second = list[second].link; } } if (first==-1) list[start].link = second; else list[start].link = first; return list[n].link;}

first is exhausted.

second is exhausted.

O(nlog2n)

Page 46: Chapter 7 Sorting

Nature Merge Sort

26 5 77 1 61 11 59 15 48 19

5 26 77 1 11 59 61 15 19 48

1 5 11 26 59 61 77 15 19 48

1 5 11 15 19 26 48 59 61 77

Page 47: Chapter 7 Sorting

Heap Sort

The heap sort algorithm will require only a fixed amount of additional storage and at the same time will have as its worst case and average computing time O(n log n).

While heap sort is slightly slower than merge sort using O(n) additional space, it is faster than merge sort using O(1) additional space.

Page 48: Chapter 7 Sorting

Heap Sort Example

26[1]

77[3]5[2]

11[6]61[5]

48[9]

59[7]

19[10]15[8]

1[4]

Array interpreted as a binary tree

Page 49: Chapter 7 Sorting

Heap Sort Example

77[1]

59[3]61[2]

11[6]19[5]

1[9]

26[7]

5[10]15[8]

48[4]

Max heap following first for loop of heapsort

initial heap

exchange

Page 50: Chapter 7 Sorting

Heap Sort Example

61[1]

59[3]48[2]

11[6]19[5]

1[9]

26[7]

77[10] 5[8]

15[4]

Page 51: Chapter 7 Sorting

Heap Sort Example

59[1]

26[3]48[2]

11[6]19[5]

61[9]

1[7]

77[10]5[8]

15[4]

Page 52: Chapter 7 Sorting

Heap Sort Example

48[1]

26[3]19[2]

11[6]5[5]

61[9]

1[7]

77[10]

59[8]

15[4]

Page 53: Chapter 7 Sorting

Heap Sort Example

26[1]

11[3]19[2]

1[6]5[5]

61[9]

48[7]

77[10]

59[8]

15[4]

Page 54: Chapter 7 Sorting

Heap Sortvoid adjust(element list[], int root, int n){ int child, rootkey; element temp; temp=list[root]; rootkey=list[root].key; child=2*root; while (child <= n) { if ((child < n) && (list[child].key < list[child+1].key)) child++; if (rootkey > list[child].key) break; else { list[child/2] = list[child]; child *= 2; } } list[child/2] = temp;}

2i 2i+1

Page 55: Chapter 7 Sorting

Heap Sort

void heapsort(element list[], int n){ int i, j; element temp; for (i=n/2; i>0; i--) adjust(list, i, n); for (i=n-1; i>0; i--) { SWAP(list[1], list[i+1], temp); adjust(list, 1, i); }}

ascending order (max heap)

bottom-up

n-1 cylces

top-down

Page 56: Chapter 7 Sorting

Radix Sort

We now examine the problem of sorting records that have several keys.

These keys are labeled K0,K1, ... ,Kr-1, with K0 be the most significant key and Kr-1 the least.

R0, ... ,Rn-1, is lexically sorted with respect to the keys K0,K1, ... ,Kr-1 iff

, 0 i<n-1≦( , ,..., ) ( , ,..., )k k k k k ki i i

r

i i i

r0 1 1

1

0

1

1

1

1

Page 57: Chapter 7 Sorting

Radix Sort

We now examine the problem of sorting records that have several keys.

These keys are labeled K0,K1, ... ,Kr-1, with K0 be the most significant key and Kr-1 the least.

R0, ... ,Rn-1, is lexically sorted with respect to the keys K0,K1, ... ,Kr-1 iff

, 0 i<n-1≦

Most significant digit first: sort on K0, then K1, ... Least significant digit first: sort on Kr-1, then Kr-2, ...

( , ,..., ) ( , ,..., )k k k k k ki i i

r

i i i

r0 1 1

1

0

1

1

1

1

Page 58: Chapter 7 Sorting

Arrangement of cards after first pass of an MSD sort

Suits: < < < Face values: 2 < 3 < 4 < … < J < Q < K < A

(1) MSD sort first, e.g., bin sort, four bins LSD sort second, e.g., insertion sort(2) LSD sort first, e.g., bin sort, 13 bins 2, 3, 4, …, 10, J, Q, K, A MSD sort, e.g., bin sort four bins

Page 59: Chapter 7 Sorting

Arrangement of cards after first pass of an LSD sort

Page 60: Chapter 7 Sorting

LSD radix sort example

Page 61: Chapter 7 Sorting

LSD radix sort example (Cont.)

Page 62: Chapter 7 Sorting

LSD radix sort example (Cont.)

Page 63: Chapter 7 Sorting

LSD Radix Sort

An LSD radix r sort, R0, R1, ..., Rn-1 have the keys that are d-tuples

(x0, x1, ..., xd-1)

#define MAX_DIGIT 3#define RADIX_SIZE 10typedef struct list_node *list_pointer;typedef struct list_node { int key[MAX_DIGIT]; list_pointer link;}

Page 64: Chapter 7 Sorting

LSD Radix Sort

list_pointer radix_sort(list_pointer ptr){ list_pointer front[RADIX_SIZE],

rear[RADIX_SIZE]; int i, j, digit; for (i=MAX_DIGIT-1; i>=0; i--) { for (j=0; j<RADIX_SIZE; j++) front[j]=read[j]=NULL; while (ptr) { digit=ptr->key[I]; if (!front[digit]) front[digit]=ptr; else rear[digit]->link=ptr;

Initialize bins to beempty queue.

Put records into queues.

Page 65: Chapter 7 Sorting

LSD Radix Sort

rear[digit]=ptr; ptr=ptr->link; } /* reestablish the linked list for the next pass */ ptr= NULL; for (j=RADIX_SIZE-1; j>=0; j++) if (front[j]) { rear[j]->link=ptr; ptr=front[j]; } } return ptr;}

Get next record.

O(n)

O(r)

O(d(n+r))

Page 66: Chapter 7 Sorting

List and Table Sort

Most sorting methods require excessive data movement since we must physically move records following some comparisons.

If the records are large, this slows down the sorting process.

We can reduce data movement by using a linked list representation.

Page 67: Chapter 7 Sorting

List and Table Sort

However, in some applications we must physically rearrange the records so that they are in the required order.

We can achieve considerable savings by first performing a liked list sort and then physically rearranging the records according to the order specified in the list.

Page 68: Chapter 7 Sorting

List and Table Sort

In this section, we examine three methods for rearranging the lists into sorted order.

list_sort1 and list_sort2 require a linked list representation.

table_sort uses an auxiliary table that indirectly references the list's records.

Page 69: Chapter 7 Sorting

In Place Sorting

R0 R1 R2 R3 R4 R5 R6 R7 R8 R926 5 77 1 61 11 59 15 48 19 8 5 -1 1 2 7 4 9 6 0

ikeylink

R0 <--> R3

1 5 77 26 61 11 59 15 48 19 1 5 -1 8 2 7 4 9 6 3

How to know where the predecessor is ? I.E. its link should be modified.

Page 70: Chapter 7 Sorting

list_sort 1void list_sort1(element list[], int n, int start){ int i, last, current; element temp; last = -1; for (current=start; current!=-1; current=list[current].link){ list[current].linkb = last; last = current; } for (i=0; i<n-1; i++) { if (start!=i) { if (list[i].link+1) list[list[i].link].linkb= start; list[list[i].linkb].link= start; SWAP(list[start], list[i].temp); } start = list[i].link; }}

Page 71: Chapter 7 Sorting

Example of list_sort 1

Page 72: Chapter 7 Sorting

Example of list_sort 1

Page 73: Chapter 7 Sorting

Example of list_sort 1

Page 74: Chapter 7 Sorting

list_sort 1

list_sort 1 require to concert the singly linked list into a doubly linked one.

However, we can avoid this by introduce list_sort 2.

Page 75: Chapter 7 Sorting

list_sort 2/* list sort with only one link field */void list_sort2 (element list[], int n, int start){

int i.next;element temp;for(i = 0; i < n-1; i++){ while(start < i) start = list[start].link; next = list[start].link; /* save index of next largest key */ if (start != i){

SWAP(list[i],list[start],temp);list[i].link = start;

} srart = next;}

}

Page 76: Chapter 7 Sorting

Example of list_sort 2

Page 77: Chapter 7 Sorting

Example of list_sort 2

Page 78: Chapter 7 Sorting

Example of list_sort 2

Page 79: Chapter 7 Sorting

Table sort

In previous section, we use link list to represent sort result.

Now we use auxiliary one dimension table to represent it (i.e. Rt[i] is the record with the ith smallest key).

How to physically rearrange the records according to this table?

Page 80: Chapter 7 Sorting

table_sort

void table_sort (element list[], int n, int table[]){/* rearrange list[0],...,list[n-1] to correspond to the seque

nce list[table[0]],...,list[table[n-1]] */int i,current,next;element temp;for (i = 0; i < n-1; i++) if(table[i] != i){ /* nontrivial cycle starting a i */

temp = list[i];current = i;

Page 81: Chapter 7 Sorting

table_sort

do { next = table[current]; list[current] = list[next]; table[current] = current; current = next;

} while (table[current] != i); list[current] = temp; table[current] = current;

}}

Page 82: Chapter 7 Sorting

table_sort

0 1 2 3 4

4 3 1 2 0

R0

50

R1

9

R2

11R3

8

R4

3

Beforesort

Auxiliarytable

T

Key

Aftersort

Auxiliarytable

T

第 0名在第 4個位置,第 1名在第 3個位置,…,第 4名在第 0個位置

Data are not moved.(0,1,2,3,4)

(4,3,1,2,0)

A permutationlinked list sort

(0,1,2,3,4)

Page 83: Chapter 7 Sorting

Every permutation is made of disjoint cycles.

Example

R0 R1 R2 R3 R4 R5 R6 R735 14 12 42 26 50 31 182 1 7 4 6 0 3 5

keytable

two nontrivial cycles

R0 R2 R7 R5 R02 7 5 0

R3 R4 R6 R34 6 3 4

trivial cycle

R1 R11

table_sort

Page 84: Chapter 7 Sorting

table_sort(1) R0 R2 R7 R5 R0

0 2 7 512 18 50 35

35 0

(2) i=1,2 t[i]=i

(3) R3 R4 R6 R33 4 626 31 42

42 3

Page 85: Chapter 7 Sorting

Summary of internal sorting

Of the several sorting methods we have studied no one method is best. Some methods are good for small n, others for large n.

An insertion sort works well when the list is already partially ordered. It is also the best sorting method for small n.

Merge sort has the best worst case behavior, but it requires more storage than a heap sort, and has slightly more overhead than quick sort.

Quick sort has the best average behavior, but its worst case behavior is O(n2).

The behavior of radix sort depends on the size of the keys and the choice of the radix.

Page 86: Chapter 7 Sorting

Practical Considerations for Internal Sorting

Data movement slow down sorting process insertion sort and merge sort --> linked file

perform a linked list sort + rearrange records

Page 87: Chapter 7 Sorting

Complexity of Sort

stability space timebest average worst

Bubble Sort stable little O(n) O(n2) O(n2)Insertion Sort stable little O(n) O(n2) O(n2)Quick Sort untable O(logn) O(nlogn) O(nlogn) O(n2)Merge Sort stable O(n) O(nlogn) O(nlogn) O(nlogn)

Heap Sort untable little O(nlogn) O(nlogn) O(nlogn)Radix Sort stable O(np) O(nlogn) O(nlogn) O(nlogn)List Sort ? O(n) O(1) O(n) O(n)Table Sort ? O(n) O(1) O(n) O(n)

Page 88: Chapter 7 Sorting

Comparison

n < 20: insertion sort

20 n < 45: quick sort

n 45: merge sort

hybrid method: merge sort + quick sortmerge sort + insertion sort

Page 89: Chapter 7 Sorting

External Sort

To sort large files (cannot be contained in the internal memory).

Following overheads apply: Seek time Latency time Transmission time

The most popular method for sorting on external storage devices is merge sort. phase 1 : Segment the input file & sort the segments

(runs) phase 2 : Merge the runs

Page 90: Chapter 7 Sorting

750 Records 750 Records 750 Records 750 Records 750 Records 750 Records

1500 Records 1500 Records 1500 Records

3000 Records

4500 Records

Block Size = 250

(1) sort three blocks at a time and write them out onto scratch pad(2) three blocks: two input buffers & one output buffer

2 2/3 passes

File: 4500 records, A1, …, A4500internal memory: 750 records (3 blocks)block length: 250 recordsinput disk vs. scratch pad (disk)

Page 91: Chapter 7 Sorting

Time Complexity of External Sort

input/output time ts = maximum seek time tl = maximum latency time trw = time to read/write one block of 250 records tIO = ts + tl + trw cpu processing time tIS = time to internally sort 750 records ntm = time to merge n records from input buffers to the o

utput buffer

Page 92: Chapter 7 Sorting

Time Complexity of External Sort

Operation time

(1) read 18 blocks of input , 18tIO,internally sort, 6tIS,write 18 blocks, 18tIO

36 tIO +6 tIS

(2) merge runs 1-6 in pairs 36 tIO +4500 tm

(3) merge two runs of 1500 recordseach, 12 blocks

24 tIO +3000 tm

(4) merge one run of 3000 recordswith one run of 1500 records

36 tIO +4500 tm

Total Time 96 tIO +12000 tm+6 tIS

Page 93: Chapter 7 Sorting

Consider Parallelism

Carry out the CPU operation and I/O operationin parallel

132 tIO = 12000 tm + 6 tIS Two disks: 132 tIO is reduced to 66 tIO

Page 94: Chapter 7 Sorting

k-way Merging

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

A 4-way merge on 16 runs

2 passes (4-way) vs. 4 passes (2-way)

Page 95: Chapter 7 Sorting

k-way Merging A k-way merge on m runs requires at most logkm passe

s over the data. To begin with, k-runs of size S1,S2,...,Sk can no longer be

merged internally in time. The most direct way to merge k-runs would be to make k-

1 comparisons to determine the next record to output. Hence, the total number of key comparisons being made i

s n(k-1)logkm. We can achieve a significant reduction in the number of c

omparisons by using a loser tree with k leaves (Chapter 5).

)S(Ok

i∑1

Page 96: Chapter 7 Sorting

Optimal Merging Of Runs

15542

15

5

42

external path length:

2*3+4*3+5*2+15*1=43 2*2+4*2+5*2+15*2=52

Page 97: Chapter 7 Sorting

Optimal Merging Of Runs

When we merge by using the first merge tree, we merge some records only once, while others may be merged up to three times.

In the second merge tree, we merge each record exactly twice.

We can construct Huffman tree to solve this problem.

Page 98: Chapter 7 Sorting

Construction of a Huffman tree

32

5

Runs of length : 2,3,5,7,9,13

Page 99: Chapter 7 Sorting

Construction of a Huffman tree

32

5 5

10

Runs of length : 2,3,5,7,9,13

Page 100: Chapter 7 Sorting

Construction of a Huffman tree

32

5 5

1097

16

Runs of length : 2,3,5,7,9,13

Page 101: Chapter 7 Sorting

Construction of a Huffman tree

32

5 5

1097

16

13

23

Runs of length : 2,3,5,7,9,13

Page 102: Chapter 7 Sorting

Construction of a Huffman tree

32

5 5

1097

16

13

23

39

Runs of length : 2,3,5,7,9,13