Sorting

8

Click here to load reader

Transcript of Sorting

Page 1: Sorting

Sorting

-------

Sorting data is very important in computer applications.

We will look at 5 different methods of sorting arrays:

Best Average Worst Time to sort Advantages Disadvantages

Case Case Case 10000 int's

in range

1 - 200

on Dell

Bubble

Sort

with

imp. O(n) O(n^2) O(n^2) 34.4 sec Simple Inefficient

Insertion

Sort O(n) O(n^2) O(n^2) 6.75 sec Fairly fast Lots of swaps

esp if array

is somewhat

sorted

Selection

Sort O(n^2) O(n^2) O(n^2) 10.0 sec Minimal swaps Always takes

the same amt

of time even

if array is

somewhat sorted

Merge

Sort O(nlgn) O(nlgn) O(nlgn) .169 sec Fairly fast Requires

auxiliary array

(extra memory)

Quick

Sort O(nlgn) O(nlgn) O(n^2) .067 sec Very fast in Very bad

average case when array is

Does sorting sorted or in

in place reverse-sorted

order

Complicated

Bubble Sort

-----------

You may be familiar with Bubble Sort - it is a very simple type of

sort, but rather inefficient (and discredited by most

"computer scientists"). However, it is easy to implement and easy to

remember and not that bad for a small amount of data. It was fine

sorting about 1500 integers in the range from 1 - 200 on my

Dell machine.

The basic idea for an array with n elements is to make n - 1 passes

through the array exchanging adjacent elements that are out of order.

Page 2: Sorting

The smaller numbers "bubble up" to the top of the array. The largest

value is guaranteed to sink to the bottom during the first pass.

DEMO

For n elements, how many comparisons do we have to make?

2 nested loops that are each executed n - 1 times:

(n - 1) * (n - 1) = n^2 - 2n + 1

As n becomes very large, the dominant term is n^2.

If an array with 1 element takes 1 ns to sort,

an array with 1000 elements takes 1000000 ns to sort.

So the amount of time to sort n elements is proportional to n^2

or O(n^2)

We could actually decrease the time, by checking if we actually

make any exchanges on a particular pass. If we don't, the array is

sorted and we can stop.

If we did this, what kind of array would be the "best case", ie.

would take the least amount of time to run for BubbleSort - a sorted

array would only require one pass through the array or about n

comparisons.

What would be the worst case? - array in descending sorted

order would always require about n^2 comparisons

Here is the code for BubbleSort - you could add a check to see

if any exchanges were made on a particular pass through the

array and quit if there weren't any.

public class BubbleSort {

public static void sort(int[] array) {

for (int i = 0; i < array.length - 1; i++) {

for (int j = 0; j < array.length - 1; j++) {

if (array[j] > array[j + 1]) {

int tmp = array[j];

array[j] = array[j + 1];

array[j + 1] = tmp;

}

}

}

}

Page 3: Sorting

Improved version of BubbleSort:

public class BubbleSort {

public static void sort(int[] array) {

boolean done = false;

for (int i = 0; i < array.length - 1 && !done; i++) {

done = true;

for (int j = 0; j < array.length - 1 - i; j++) {

if (array[j] > array[j + 1]) {

done = false;

int tmp = array[j];

array[j] = array[j + 1];

array[j + 1] = tmp;

}

}

}

}

Insertion Sort

--------------

Insertion sort is something like sorting a hand of playing cards from

left to right. Each successive card is inserted in the correct

position.

DEMO

What is the best case for insertion sort - the case that requires the

least number of comparisons? the worst case? the average case?

best O(n) worst O(n^2) average O(n^2)

On my machine, sorting 3000 integers in the range 1 - 200 took about

the same amount of time as sorting 1500 integers using BubbleSort

We are actually doing

the sum from 1 - n comparisons which is equal to

n(n + 1)/ 2 = 1/2 n^2 + n

which is why we could sort twice as many integers as BubbleSort

in the same amount of time

public static void sort(int[] array) {

for (int i = 1; i < array.length; i++) {

int toBeInserted = array[i];

Page 4: Sorting

int j;

for (j = i - 1; j >= 0 && toBeInserted < array[j]; j--)

array[j + 1] = array[j];

array[j + 1] = toBeInserted;

}

}

Selection Sort

--------------

Selection sort involves finding the smallest integer in the array and

exchanging it with the first integer in the array, then finding the

next smallest integer in the array and exchanging it with the second

integer in the array, etc. until the last element is reached

which is already in the correct position by default.

DEMO

Best, worst, and average cases all require n(n+1)/2 comparisons or

about 1/2n^2

O(n^2)

Selection sort took longer on my machine than insertion sort in the

average case. Probably because if the inserted element is in place

that pass ends. 8 sec/10000 vs. 10 sec/100000

45 sec for bubble sort

public static void sort(int[] array) {

for (int i = 0; i < array.length - 1; i++) {

int min = array[i];

int minPos = i;

for (int j = i + 1; j < array.length; j++)

if (array[j] < min) {

min = array[j];

minPos = j;

}

if (minPos != i) {

int tmp = array[i];

array[i] = min;

array[minPos] = tmp;

}

Page 5: Sorting

}

}

Merge Sort

----------

Merge sort is a faster sort than any of the ones we have looked at so

far. It involves successively cutting the array in half until each

array has only one element - then the arrays are merged.

DEMO

8

4 4 8 3 levels

2 2 2 2 8

1 1 1 1 1 1 1 1 8

8 * 3 comparisons = 24 comparisons for 8 elements

How is 3 related to 8?

log 2 (8) = 3 2^3 = 8

How about 16

log 2 (16) = 4 to sort 16 elements requires 16 * 6 comparisons = 96

MergeSort - O(nlgn) big improvement over n^2 when n is very large:

Selection sort Merge sort

-------------- ----------

10 ns / 10 elements 10 ns / 10 elements

? / 1024 elements ? / 1024 elements

(1024)^2 ~ 1,000,000 ns 1024 log2 (1024) = 1024 * 10 = 10,024 ns

Big savings with just 1000 elements - with 10000 elements took about

1 sec on my machine

The problem with merge sort is that it requires an extra array in

which to merge the smaller arrays.

The code here is even more wasteful of memory, but the coding is

fairly straight forward. There are better routines that use less

memory. Merge sort is an excellent use of recursion - it would be

very tedious to keep track of all of the halved arrays that needed to

be merged to create the final sorted array.

public static void sort(int[] array) {

Page 6: Sorting

//Copy sorted array to original array when merge sort is complete

System.arraycopy

(mergeSort(array, 0, array.length - 1), 0, array, 0, array.length);

}

private static int[] mergeSort(int[] array, int first, int last) {

int [] newArray;

if (first == last) {

newArray = new int[1];

newArray[0] = array[first];

}

else {

int mid = (last - first) / 2 + first;

int [] firstHalf = mergeSort(array, first, mid);

int [] lastHalf = mergeSort(array, mid + 1, last);

newArray = new int[last - first + 1];

int i, j, k;

for(i = 0, j = 0, k = 0; i < newArray.length &&

j < firstHalf.length &&

k < lastHalf.length; i++) {

if (firstHalf[j] < lastHalf[k])

newArray[i] = firstHalf[j++];

else

newArray[i] = lastHalf[k++];

}

if (j < firstHalf.length)

for ( ; i < newArray.length; i++, j++)

newArray[i] = firstHalf[j];

else

for ( ; i < newArray.length; i++, k++)

newArray[i] = lastHalf[k];

}

//System.out.println(MergeSort.toString(newArray));

return newArray;

}

Quicksort

---------

Quicksort was invented by a man named C.A.R. Hoare in 1962. It is the

fastest known general purpose in-memory sorting algorithm in the

average case.

It works by partitioning the array such that part of the array

contains all of the values smaller than the "pivot" value and

everything greater than or equal to the pivot value. This is done

recursively until the array is partitioned into n arrays containing

1 element each. At this point the array is sorted.

DEMO

Page 7: Sorting

The best case for Quicksort is when the partitions are always the

same size. For the best case or average case, the running time for

Quicksort is proportional to nlgn just like merge sort. The worst case

for Quicksort is when the original array is either in sorted order or

reverse sorted order. That results in partitions such that the first

(or last partition) always contains 1 element and the other partition

contains the rest of the elements.

public class Quicksort {

public static void sort(int[] array) {

Quicksort(array, 0, array.length - 1);

}

private static void Quicksort(int[] array, int first, int last) {

if (first < last) {

//System.err.println(first + " " + last + " " + toString(array));

int mid = partition(array, first, last);

Quicksort(array, first, mid);

Quicksort(array, mid + 1, last);

}

}

private static int partition(int array[], int first, int last) {

int x = array[first];

int i = first - 1;

int j = last + 1;

while (true) {

do {

j--;

} while (array[j] > x);

do {

i++;

} while (array[i] < x);

if ( i < j ) {

int tmp = array[i];

array[i] = array[j];

array[j] = tmp;

Page 8: Sorting

}

else

return j;

}

}