Sorting Data

21
Sorting Data • Considerations – Average, best, worst case complexity • For swaps and compares – Is extra memory required? – Difficulty to program? – Stability of equal keys • What is the fastest possible sort using comparisons?

description

Sorting Data. Considerations Average, best, worst case complexity For swaps and compares Is extra memory required? Difficulty to program? Stability of equal keys What is the fastest possible sort using comparisons?. Elementary Sorting Methods Complexity O(n 2 ), All are Stable. - PowerPoint PPT Presentation

Transcript of Sorting Data

Page 1: Sorting Data

Sorting Data

• Considerations– Average, best, worst case complexity

• For swaps and compares

– Is extra memory required? – Difficulty to program?– Stability of equal keys

• What is the fastest possible sort using comparisons?

Page 2: Sorting Data

Elementary Sorting MethodsComplexity O(n2), All are Stable

• Bubble Sort (N(N-1)/2)comparisons, about N(N-1)/4 swaps)

• Selection Sort (N(N-1)/2 comparisons, N-1 swaps)

– Minimizes the number of swaps– Worst case equals average case

• Insertion Sort (N(N-1)/4 comparisons and copies)

– Good for lists that are nearly sorted (O(N) best case)

Page 3: Sorting Data

Bubble Sort

pass = 0;

swaps = true;

while (pass < n && swaps == true)

{ swaps = false;

for (index=0; index<n-pass; index++)

{ if (sortArray[index] > sortArray[index+1])

{ swap(sortArray, index, index+1);

swaps = true;

} }

pass++;

}

Pairwise compares of adjacent elements; swap where necessary

Page 4: Sorting Data

Selection Sort

for (i=0; i<n-1; i++)

{

minimum = i;

for (j=i+1; j<n; j++)

{

if ( sortArray[j] < sortArray[minimum]) minimum = j;

}

swap(sortArray, i, minimum);

}

Find Minimum n-1 times

Page 5: Sorting Data

Insertion Sort

for (i=1; i<n; i++)

{

j = i;

save = sortArray[i];

while (j>0 && save < sortArray[j-1])

{

sortArray[j] = sortArray[j-- - 1);

}

sortArray[j] = save;

}

Insert next entry into a growing sorted table

Page 6: Sorting Data

Proof by Induction• Select Base Case (n = 1)

• State the Hypothesis (assume for n=k)

• State what is to be proved (prove for n=k+1)

• Example:

Base case: For n = 1, 1 = 1 * 2 / 2 = 1

Hypothesis: Assume for n=k, 1 + 2 + … + k = k * (k+1)/2

To Prove: 1 + 2 + … + k+1 = (k+1) * (k+2) /2

1 + 2 + … + k+1 = 1 + 2 + … + k + (k+1)

By the hypothesis, this equals k * (k+1)/2 + (k+1)

= (k+1)(k/2 + 1) = (k+1)(k/2 + 2/2) = (k+1)(k+2)/2

Therefore by induction, the relationship holds for all positive k >= 1

Page 7: Sorting Data

RecursionUseful for advanced sorts and for divide & conquer algorithms

• Relationship to mathematical induction• Key design principals

– Relationship between algorithm(n) and algorithm(m) where m<n.– Base Case (How does it stop?)

• When is it useful? What is the overhead?– Relationship between n and m– Tail recursion with a single recursive call– Replace by manually creating stacks

• Examples – simple loop, factorial, gcd, binary search, tower of hanoii

Page 8: Sorting Data

Recursion Examples

• Factorial: 5! = 5 * 4!

• Greatest Common Denominator: gcd(x,y) = gcd(y%x,x) if x<y

• Binary Search

int binSearch( array, low, high, value)

{

if (high – low <= 1) return -1; // Base case

middle = (low + high) / 2;

if value < array[middle]) binSearch(array, low, middle-1, value)

else if value > array[middle]) binSearch(array, middle+1, high, value)

else return middle;

}

Page 9: Sorting Data

Breaking the O(N2) BarrierBased on either bubble or insertion sort

Complexity from O(N7/6) to O(N3/2) based on gap selection

• Shell Sortwhile (gap > 0){ for (index=gap; index<n; index++) { temp = sortArray[index];

compareIndex = index;while(compareIndex>=gap && sortArray[compareIndex-

gap]>=temp){ sortArray[compareIndex]=sortArray[compareIndex-gap]; compareIndex -= gap;}sortArray[compareIndex] = temp;

} adjustGap( gap ); // different patterns (/=2, (gap-1)/3, (gap+1)/2}

Page 10: Sorting Data

Shell sort (based on bubble)

int index;

while (gap > 0)

{ swaps = true;

while (swaps)

{ swaps = false;

for (index = 0; index < gap; index++)

{ if (sort[index] > sort[index + gap])

{ swap(sort, index, index + gap);

swaps = true;

} } }

adjustGap( gap );

}

Page 11: Sorting Data

Merge SortAlways O(N lgN) but need more memory

• Merge Sortvoid mergeSort(double[] sortArray, int low, int high){ int mid = (low+high)/2; if (low == high) return; mergeSort(sortArray, low, mid); mergeSort(sortArray, mid+1, high); merge(sortArray, low, mid, high);}

• Merge method must:– Allocate an array for copying– Merge two sorted arrays together– Copy back to original array

Page 12: Sorting Data

Merge methodvoid merge(double[] sort, int low, int middle, int high)

{ int n = high – low + 1,

int lowPtr = low, highPtr = middle+1, spot = 0;

double work = new double[high – low + 1];

while(low <= middle && high <=top)

{ if (sort[lowPtr]<sort[highPtr]) work[spot++] = sort[lowPtr++];

else work[spot++] = sort[highPtr++]; }

while (lowPtr<=middle) work[spot++] = sort[lowPtr++];

while (highPtr <= top) work[spot++] = sort[highPtr++];

lowPtr = low;

for (spot=0; spot<high-low+1; spot++)

sortArray[lowPtr++] = workArray[spot];

}

Page 13: Sorting Data

Analysis of Merge Sort

16

88

4

2

444

2 222222

Work at each level totals to 16, lg 16 = 4 levels, complexity = 16 lg16

Page 14: Sorting Data

Quick SortO(NlgN) average case, in place

void quickSort(double[] sortArray, int left, int right){ if (right <= left) return; double pivot = sortArray[right]; int middle = partition(sortArray, left, right, pivot); quickSort(sortArray, left, middle-1); quickSort(sortArray, middle+1, right);}

• Refinements to avoid O(N2) worst case and speed up.– Choice of pivit– Combining with insertion sort

• Other uses (find the kth biggest number).

Page 15: Sorting Data

Quick Sort Partitioningint partition(double[] sortArray, int left, int right, int pivot)

{ int origRight = right;

left -= 1;

for (;;)

{ while(sortArray[++left] < pivot);

while(sortArray[--right] > pivot);

if (left >= right) break;

swap(sortArray, left, right);

}

swap(sortArray, left, origRight);

return left;

}

Page 16: Sorting Data

Radix Sort (First Version)

1. Choose the number of buckets (b)2. Drop next significant part of data into buckets3. Gather data from buckets back into original array4. Repeat the above two steps, finishing at the most

significant piece of data5. Notes

a. Maximum memory needed for each bucketb. Complexity: O(p * 2n) where

i. p = (Max + b – 1)/b ii. 2n because dropping and gathering touches each element

twice

Page 17: Sorting Data

Radix Sort exampleOriginal Data

82 41 63 92 86 21 30 76 43 94

BucketsPass 1

30 41,21 82,92 63,43 94 86,76

0 1 2 3 4 5 6 7 8 9

Gather 30 41 21 82 92 63 43 94 86 76

BucketsPass 2

21 30 41,43 63 76 82,86 92,94

0 1 2 3 4 5 6 7 8 9

Gather 21 30 41 43 63 76 82 86 92 94

Notes:1.Each pass is a digit of the data (x / 10 (pass – 1)) % 102.Two passes because largest number < number of buckets squared3.Complexity is: O( 2pn) = O(pn) where p is the number of passes4.In this case, only two elements are in each bucket, but we couldn’t depend on that in the general case

Page 18: Sorting Data

Refined Radix Sort1. Create and initialize an array (Counts) of size buckets + 1

2. Initialize the array to zeroes

3. Store actual bucket sizes into Counts array (starting index = 1)

4. Perform a prefix sum on Counts array to compute starting offsets

5. Use Counts array to drop elements into a second array of numbers

6. Advantages:a. Use alternating arrays to avoid the gather operations

b. Only two times the memory is needed

7. Complexity: O(p(2n + 2b)) = O(p(n+b))

8. Notes: a. Increased buckets can reduce the number of passes, but prefix sum

overhead will limit performance benefits.

b. Radix sort does no comparisons , so O(n lg n) limitation doesn’t apply.

Page 19: Sorting Data

Refined Radix Example

• Dump from original array to alternate array

• No gather operation needed

• Index to store into count array is one bigger than the bucket. count array index 5 in the above example has a count of 1 because it corresponds to bucket 4.

Original Data 82 41 63 92 86 21 30 76 43 94 25

Counts 0 1 2 2 2 1 1 2 0 0 0

Count Array Indices 0 1 2 3 4 5 6 7 8 9 10

Prefix Sum 0 1 3 5 7 8 9 11 11 11 11

Bucket and Count Array Indices 0 1 2 3 4 5 6 7 8 9

Ending Content 1 3 5 7 8 9 11 11 11 11 11

Bucket Indices 0 1 2 3 4 5 6 7 8 9

Pass 1 Data 30 41 21 82 92 63 43 94 25 86 76

Data offsets 0 1 2 3 4 5 6 7 8 9 10

Page 20: Sorting Data

Optimal Comparison Sort• There are n! possible

comparison sorts• All sorts can be

modeled with a decision tree

• Optimal sort will be completely balanced

• Depth of the balanced decision tree is O(lg(n!)

Decision Tree

compare

comparecompare

>

<=

<= <=

>

>

Page 21: Sorting Data

Prove optimal sort <= O(n lg n)• Optimal comparison sort <= O(n lg n)

lg (n!) = lg(n) + lg (n-1) + lg (n -2) + … + lg(1)< lg n + lg n + lg n + … + lg n = n lg n = O(n lg n)

• Optimal comparison sort >= O(n lg n)lg (n!) = lg(n) + … + lg(n/2+1) + lg(n/2) + … + lg(n/4+1) + lg(n/4) + … +

lg(n/8+1) + …> n/2 lg(n/2) + n/4 lg(n/4) + n/8 lg(n/8) + … = n/2 lg(n) – n/2lg 2 + n/4 lg(n) – n/4lg 4 + n/8 lg n – n/8 lg 8≈ n lg (n) – ½ n – 2/4 n – 3/8 n – 4/16 n - …= n lg (n) – n (1/2 + 2/4 + 3/8 + 4/16 + … = n lg n – 2nBut O(n lg n – 2n) = O(n lg n)

• Therefore optimal sort = O(n lg n) • The series is well known: ½ + 2/4 + 3/8 + … = ∑ n/2n ≈ 2• Proof: S(2n) – S(n) = S(n)

= 1 + 1 + 6/8 + 8/16 + 10/32 – ½ - 2/4 – 3/8 -…= 1 + ½ + ¼ + 1/8 +…= 2