COP 3540 Data Structures with OOP

21
1/20 COP 3540 Data Structures with OOP Chapter 7 - Part 2 Advanced Sorting

description

COP 3540 Data Structures with OOP. Chapter 7 - Part 2 Advanced Sorting. Quicksort. Very famous and popular. For many (not all) cases, it provides excellent performance, generally O(nlog 2 n). Excellent for internal sorting (not disk files). Quick sort is based on Partitioning - PowerPoint PPT Presentation

Transcript of COP 3540 Data Structures with OOP

Page 1: COP 3540 Data Structures with OOP

1/20

COP 3540 Data Structures with OOP

Chapter 7 - Part 2Advanced Sorting

Page 2: COP 3540 Data Structures with OOP

2/20

Quicksort

Very famous and popular.

For many (not all) cases, it provides excellent performance, generally O(nlog2n).

Excellent for internal sorting (not disk files).

Quick sort is based on Partitioning

Operates by partitioning an array into two parts, as expected.

Calls itself recursively to quicksort each two partitions.

Page 3: COP 3540 Data Structures with OOP

3/20

Consider the algorithm:

public void recQuickSort (int left, int right){

if (right-left <=0)return; // base case If size = 1, it is already sorted

else{

int partition = partitionIt(left, right); // can you explain this code?

recQuickSort (left, partition-1); // sort left side recursivelyrec QuickSort (partition+1,right); // sort right side recursively

}} // end recQuickSort()

Note: We first check to see if we have the trivial case: the base case. If not, go!

Note now, we partition the array into smaller (left) and larger (right) keys.

note: not saying what ‘left’ and ‘right’ are though…or where pivot is.Now: recursive routine:

Sort the left side: recursively and then the right side: recursively. But, recursively calling recQuickSort invokes the partition algorithm

again and a recursive call to recQuickSort (on the left…) again…

Page 4: COP 3540 Data Structures with OOP

4/20

So what does this actually mean?? Consider the real operation here:We are recursively calling recQuickSort

unless the base case is encountered (eventually it will)

We invoke the partition algorithm again (and again and again)(which successively divides the left side into two subarrays –

a ‘smaller left side’ and a ‘small right side’ of the original left side…)

and a recursive call to recQuickSort (on the left…) again…again, and again….

Note: as we keep on going to the left ….to the left, there is a corresponding right ‘side’ that is also becoming smaller and smaller…

So, we are sorting the subarrays by recursively calling ourselves and executing the partitioning algorithm as the first step in this call each time!

We then continue to partition and ultimately arrive at a base case.From this ‘smallest of arrays’ we will then recursively call the right

subarray (for the first time) and then ‘essentially’ start over calling the perhaps partition and recursively call the left subarray …. over and over and over….

Page 5: COP 3540 Data Structures with OOP

5/20

Selection of the Pivot Value The partitioned method requires a pivot value to do the

partitioning.

Ideally, the pivot value should be one of the key values you are trying to sort.

Simple approach: select the rightmost item of the sub-array being partitioned. (At least this is an element in the array to be sorted.)

After the partition, would be nice if this pivot is in its final place between the left and right sub-arrays. But we cannot assert this. We only know that the pivot value will be on the left and all items to the right are >= than the pivot value, just not sorted.

Page 6: COP 3540 Data Structures with OOP

6/20

Pivot Values But since all values to the right are greater than the pivot and

are unsorted, we merely swap the pivot with the left-scan at the conclusion of the partitioning (left scan > right scan).

This will put the original pivot in its final position…(See?) And we have our original array partitioned at the place

where the left scan and original pivot were exchanged. (Remember, the left scan proceeded to the right until it was greater than

the pivot value; right scan proceeded to the right until it was less than pivot value. AT end of the scan, left > right and we can make the conclusion above…)

This works because we know the pivot is greater than any elements in the left partition and the pivot value will be in the left partition somewhere…We now have two smaller sub-arrays.

Now go to the left sub-array and partition this, etc. recursively. Using this approach in selecting the rightmost item in a sub-

array as the pivot requires minor changes in the quicksort routine. Reflected in quickSort1.java - ahead.

Page 7: COP 3540 Data Structures with OOP

7/20

class QuickSort1App { public static void main(String[] args) { int maxSize = 16; // array size ArrayIns arr; arr = new ArrayIns(maxSize); // create array. Generate the array and for(int j=0; j<maxSize; j++) // fill array with random numbers. { long n = (int)(java.lang.Math.random()*99);

// remember how random() worked?? arr.insert(n); }// end for arr.display(); // display items arr.quickSort(); // quicksort them – here’s the easy stuff. arr.display(); // display them again } // end main() } // end class QuickSort1App

Our course, we have the driver:

Page 8: COP 3540 Data Structures with OOP

8/20

public void recQuickSort(int left, int right) { if(right-left <= 0) // if size <= 1, already sorted (base case) return; else // size is 2 or larger { long pivot = theArray[right]; // rightmost item (note argument) // theArray is instance variable…) int partition = partitionIt(left, right, pivot); // send pivot to partition

// “partition” modified; when done, swap left-scan with pivot recQuickSort(left, partition-1); // sort left side

// note: we’re one ‘in’ from where we moved the original pivot

// Appears each time we call recQuickSort and do the // partitioning, that new pivot IS in its right place with

respect to // the new sub-array. So little by little, elements are moved

to // correct position…

recQuickSort(partition+1, right);. // sort right side (takes a while to } // end if // get here! But note how the recursion works!! // note what happens when the return occurs } // end recQuickSort()

recQuickSort itself:

Page 9: COP 3540 Data Structures with OOP

9/20

Let’s look at the applet: QuickSort1.html Show Lafore Applets…. Show quicksort1, size = 100. Random. Dashed line shows subarrays.

Can see the pivot points selected and that the algorithm successively goes to the left, to the left to the left and to the right and then to the left, to the left, etc.

Successively smaller subarrays are created.

Sample: swaps: 170; comparisons: 663. If you wish to spend some time on these, the book

gives a VERY detailed explanation on the presence of the solid line, dashed line, etc.

Page 10: COP 3540 Data Structures with OOP

10/20

Some particulars (Things to Notice)

(Looking at the code in the algorithm:

The left scan starts at left-1 and the right scan starts at right (both out of bounds).

But they will each be incremented / decremented prior to their being accessed the first time.

So, not to worry…)

Page 11: COP 3540 Data Structures with OOP

11/20

QuickSort1 can provide horrible performance!

What if: 100 bars inversely sorted: Swaps 99; Comparisons 5098!

More and larger subarrays are being processed. Problem is in selecting of the pivot.

This really comes to bear if data is way out of wack! May not inversely be sorted but containing extraneous / extreme values?

This would certainly impact the choice of a pivot and resulting size of the sub-arrays.

Ideally, perhaps the pivot should be median?

Seems like this might provide better performance?

Page 12: COP 3540 Data Structures with OOP

12/20

A problem:

When sub-arrays are out of balance (like having some extreme values or skewed data) each sub-array must be divided more times causing degraded performance.

In inversely-sorted data, (that is, data comes in descending and we want to sort it ascending) we have sub-arrays of 1 thru n-1 as we progress.

This phenomenon degenerates the sort into an O(n2) sort!!! (Recall: Quicksort1 used rightmost element as pivot)

Page 13: COP 3540 Data Structures with OOP

13/20

That’s not all the problems in inversely sorted arrays!

Because of the requirement for n partitions, the number of recursive calls would become great.

Could cause stack overflow in the system and may cause your operating system to hang!

So, in QuickSort1, choosing the rightmost element as the pivot point may be good if the data is really random.

If the data is inversely sorted, this selection of a pivot is disastrous and degenerates the sort into an O(n2) sort losing all the potential advantages!

Need a better approach.

Page 14: COP 3540 Data Structures with OOP

14/20

Median of Three Partitioning

Need a better approach to avoid selecting the largest or smallest value as the pivot.

How to do this?

Take the median of first, last, and middle elements and use this as pivot. Faster than examining all elements Avoids selecting the largest or smallest. May still have a bad number, but this

approach is pretty sound.

Page 15: COP 3540 Data Structures with OOP

15/20

Applet: Run QuickSort2 using Median of Three Partitioning.

Given the applet’s random selection of 100 values, we see:

QuickSort1: 100 bars inversely sorted: Swaps 99 Comparisons 5098!

QuickSort2: 100 bars inversely sorted: Swaps: 217 Comparisons: 712!

Page 16: COP 3540 Data Structures with OOP

16/20

Constraints The median of three approach for partitioning eliminates

the likelihood of using this sort for partitions of three or fewer items to be sorted.

For small partitions, we might want to use the insertion sort so we don’t have to worry about the cutoff = 3 for the median of three partitioning. Studies are available on different cutoff sizes…

Your book presents the algorithm in Listing 7.5 where an insertion sort is used to handle sub-arrays with fewer than 10 cells. This makes sense for small ‘n.’

Let’s look at the operative routine…QuickSort3.

(Quicksort1 used rightmost element as pivot; Quicksort2 used median-of-three as pivot…)

Page 17: COP 3540 Data Structures with OOP

17/20

public void recQuickSort(int left, int right) { int size = right-left+1; if(size < 10) // insertion sort if small insertionSort(left, right); else // quicksort if large { long median = medianOf3(left, right); int partition = partitionIt(left, right, median); // fed the median as the pivot. All else, same. recQuickSort(left, partition-1); recQuickSort(partition+1, right); }// end if } // end recQuickSort()

(QuickSort3 uses insertionSort for array < 10 and median-of-three for pivot selection for arrays >=10)

recQuickSort method in QuickSort3

Page 18: COP 3540 Data Structures with OOP

18/20

Efficiency of QuickSort

One older approach uses stacks to store deferred array bounds and using loops instead of recursive calls to oversee partitioning of smaller and smaller sub-arrays.

This goal was to eliminate costly recursive method calls and costly system overhead.

Older machines had real performance penalties in realizing successive function calls. Not really a big deal nowadays!

Page 19: COP 3540 Data Structures with OOP

19/20

Efficiency of QuickSort

QuickSort operates in O(nlog2n) time, which is very good.

• (Recall Shell Sort operated in O(n(log2n)2 ) time)

QuickSort sorts are typical of divide and conquer algorithms, where targets are successively divided into smaller and smaller ‘halves’ which are processed recursively.

No need to plow deeper into this algorithm

Except for some real fine tuning, we have the idea on how to use this.

Page 20: COP 3540 Data Structures with OOP

20/20

Comparisons

See Table 7.2: O(n) value Type of Sort n 100n 1000n 10,000n n2 Insertion 10 10,000 1,000,000

100,000,000

n(logn)2 Shell Sort 10 400 9,000 160,000

n logn QuickSort 10 200 3,000 40,000

Page 21: COP 3540 Data Structures with OOP

21/20

Read about the Radix Sort!! Study end of chapter questions and terms.