Sorting Algorithms I, cont’d - CS Home · 2013-07-18 · Sorting Algorithms I, cont’d Insertion...

21
Sorting Algorithms I, cont’d CS 311 Data Structures and Algorithms Lecture Slides Friday, October 10, 2008 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] © 2005–2008 Glenn G. Chappell

Transcript of Sorting Algorithms I, cont’d - CS Home · 2013-07-18 · Sorting Algorithms I, cont’d Insertion...

Sorting Algorithms I, cont’d

CS 311 Data Structures and Algorithms

Lecture Slides

Friday, October 10, 2008

Glenn G. Chappell

Department of Computer ScienceUniversity of Alaska Fairbanks

[email protected]

© 2005–2008 Glenn G. Chappell

10 Oct 2008 CS 311 Fall 2008 2

Unit OverviewAlgorithmic Efficiency & Sorting

Major Topics

• Introduction to Analysis of Algorithms

• Introduction to Sorting

• Sorting Algorithms I

• More on Big-O

• The Limits of Sorting

• Divide-and-Conquer

• Sorting Algorithms II

• Sorting Algorithms III

• Radix Sort

• Practical Sorting

½�

10 Oct 2008 CS 311 Fall 2008 3

ReviewIntroduction to Analysis of Algorithms — Basics

Efficiency

• General: using few resources (time, space, bandwidth, etc.).

• Specific: fast (time).

Analyzing Efficiency

• Determine how the size of the input affects running time, measured in steps, in the worst case.

Scalable

• Works well with large problems.

10 Oct 2008 CS 311 Fall 2008 4

ReviewIntroduction to Analysis of Algorithms — Big-O [1/3]

Algorithm A is order f(n) [written O(f(n))] if

• There exist constants k and n0 such that

• A requires no more than k×f(n) time units to solve a problem of size n ≥ n0.

Big-O is important!

10 Oct 2008 CS 311 Fall 2008 5

ReviewIntroduction to Analysis of Algorithms — Big-O [2/3]

Here are the efficiency categories we will be using.

I will also allow O(n3), O(n4), etc.

Exponential timeO(bn), for some b > 1

Quadratic timeO(n2)

Log-linear timeO(n log n)

Linear timeO(n)

Logarithmic timeO(log n)

Constant timeO(1)

In WordsUsing Big-O

Cannot read all of input

Faster

Slower

Usually too slow to be scalable

10 Oct 2008 CS 311 Fall 2008 6

ReviewIntroduction to Analysis of Algorithms — Big-O [3/3]

Rules of Thumb• When determining big-O, we can collapse any constant number of steps into a single step.

• For nested loops:� If each is either executed n times, or executed i times, where i goes up to n (plus some constant?).

� Then the order is O(nt) where t is the number of loops.

for (int i = 0; i < n; ++i)for (int j = 0; j < i; ++j)

for (int k = j; k < i+4; ++k)++arr[j][k];

for (int i = 0; i < n; ++i)for (int j = 0; j < i; ++j)

for (int k = 0; k < 5; ++k)++arr[j][k];

O(n3)

O(n2)

10 Oct 2008 CS 311 Fall 2008 7

ReviewIntroduction to Sorting — The Basics

To sort a collection of data is to place it in order.

• The part of the data used to sort is the key.

We are interested primarily incomparison sorts: sortingalgorithms that get theirinformation by comparingitems in pairs.

A general-purpose comparisonsort places no restrictions onthe size of the list to be sorted, or the values in it (other than that they all have the same type).

3 1 3 5 25

1 2 3 5 53

x

yx<y?compare

10 Oct 2008 CS 311 Fall 2008 8

ReviewIntroduction to Sorting — Linked Lists

In addition to arrays, we are interested in sorting Linked Lists.

• A Linked List is composed of nodes. Each has a single data item and a pointer to the next node.

• These pointers are the only way to find the next data item.

• Once we have found a position within a Linked List, we can insert and delete very quickly [O(1) = constant time].

513 4 5 2

Head NULL pointerLinked

List

513 4 5 2

513 4 5 2

7

10 Oct 2008 CS 311 Fall 2008 9

ReviewIntroduction to Sorting — Analyzing

We analyze a general-purpose comparison sort using five criteria:• Efficiency

� What is the (worst-case) order of the algorithm?

� Is the algorithm much faster on average (over all possible inputs of a given size)?

• Requirements on Data� Does the algorithm require random-access data?

� Does it work with Linked Lists?

• Space Usage� Can the algorithm sort in-place?

• In-place = no large additional storage space required.

� How much additional storage (variables, buffers, etc.) is required?

• Stability� Is the algorithm stable?

• Stable = never changes order of equivalent items.

• Performance on Nearly Sorted Data� Is the algorithm faster when its input is already sorted or nearly sorted?

• Nearly sorted = (1) All items close to proper places, OR (2) few items out oforder.

10 Oct 2008 CS 311 Fall 2008 10

ReviewIntroduction to Sorting — Overview of Algorithms

There is no known sorting algorithm that has all the properties we would like one to have.

We will examine a number of sorting algorithms. Most of these fall into two categories: O(n2) and O(n log n).• Quadratic-Time [O(n2)] Algorithms

� Bubble Sort

� Selection Sort

� Insertion Sort

� Quicksort

� Treesort (later in semester)

• Log-Linear-Time [O(n log n)] Algorithms� Merge Sort

� Heap Sort (mostly later in semester)

� Introsort (not in the text)

• Special Purpose — Not Comparison Sorts� Pigeonhole Sort

� Radix Sort

10 Oct 2008 CS 311 Fall 2008 11

ReviewSorting Algorithms I — Bubble Sort: Analysis

Efficiency �

• Bubble Sort is O(n2).

• Bubble Sort also has an average-case time of O(n2). �

Requirements on Data ☺

• Bubble Sort does not require random-access data.

• It works on Linked Lists.

Space Usage ☺

• Bubble Sort can be done in-place.

Stability ☺

• Bubble Sort is stable.

Performance on Nearly Sorted Data ☺/�

• (1) We can write Bubble Sort to be O(n) if no item is far out of place. ☺

• (2) Bubble Sort is O(n2) even if only one item is far out of place. �

10 Oct 2008 CS 311 Fall 2008 12

ReviewSorting Algorithms I — Bubble Sort: Comments

Bubble Sort is very slow.

• It is never used, except:

� In CS classes, as a simple example.

� By people who do not understand sorting algorithms very well.

10 Oct 2008 CS 311 Fall 2008 13

Sorting Algorithms I, cont’dSelection Sort — Introduction

Bubble Sort wastes time.• Why “bubble” large items up? Instead, select (find) the largest item, then put it at the top in a single swap operation.� We could put the item at the top with “remove” then “insert”. But a single “swap” is almost always faster.

• The resulting algorithm is called Selection Sort.

Operations Needed• Compare

• Swap

Slightly more general operations are needed: we must be able to compare & swap non-consecutive items.

TO DO• Implement Selection Sort.

• Analyze.� See the next slide.

Done. See selection_sort.cpp, on the web page.

10 Oct 2008 CS 311 Fall 2008 14

Sorting Algorithms I, cont’dSelection Sort — Analysis

Efficiency �

• Selection Sort is O(n2).

• Selection Sort also has an average-case time of O(n2). �

Requirements on Data ☺

• Selection Sort does not require random-access data.

• It works on Linked Lists.

Space Usage ☺

• Selection Sort can be done in-place.

Stability �

• Selection Sort is not stable.

� Making it stable would slow it down even more.

Performance on Nearly Sorted Data �

• Not significantly faster on nearly sorted data: O(n2).

10 Oct 2008 CS 311 Fall 2008 15

Sorting Algorithms I, cont’dSelection Sort — Comments

We see that Selection Sort is very slow.• It is never used.

Selection Sort uses far fewer “swap” operations than Bubble Sort, but it still requires many “compare” operations.

Thus, Selection Sort is not significantly better than Bubble Sort in any way.• And it is worse in two significant ways:

� It is not at all fast for nearly sorted data.

� It is not stable.

• The moral of the story: sometimes “improvements” don’t do much good.

Next we begin looking at sorting algorithms that are actually used …

10 Oct 2008 CS 311 Fall 2008 16

Sorting Algorithms I, cont’dInsertion Sort — Introduction

Selection Sort constructs a sorted sequence in (backwards) order:• Find the greatest item, then find the next greatest, etc.

• So for each position, starting with the last, it finds the item that belongs there.

Suppose we “flip” this idea.• Instead of looking through the positions and determining what item goes there, look through the given items, and determine in which position each belongs.

The resulting algorithm is called Insertion Sort.• Iterate through the items in the sequence.

• For each, insert it in the proper place among the preceding items.

• Thus, when we are processing item k, we have items 0 .. k–1 already in sorted order.

10 Oct 2008 CS 311 Fall 2008 17

Sorting Algorithms I, cont’dInsertion Sort — Illustration

Items to left of bold bar are sorted.

Bold item = item to be inserted into sorted section.

5 9 2 5 26

6 9 2 5 25

6 9 2 5 25

Insert 5

Insert 9

5 6 9 5 22

5 5 6 9 22

2 5 5 6 92

Insert 5

Insert 2

Sorted

Two 5’s!

Two 2’s!

5 6 9 5 22

Insert 2

10 Oct 2008 CS 311 Fall 2008 18

Sorting Algorithms I, cont’dInsertion Sort — Issues

When we insert item k into the list of already sorted items(0 .. k–1), three operations are needed:• Find (the proper location).

• Remove (the old copy).

• Insert (into the proper location).

If we have random-access data (an array?), then• Find could be fast [Binary Search: O(log k)].

� In practice, we do not actually use Binary Search, as we will see.

• Remove + Insert is slow-ish [move items up: O(k)].

• Find + Remove + Insert is O(k).

If we have a Linked List, then• Find is slow-ish [Sequential Search: O(k)].

• Remove is fast [Linked-List removal: O(1)].

• Insert is fast [Linked-List insertion: O(1)].

• Find + Remove + Insert is O(k).

In both cases, Find + Remove + Insert is O(k).

10 Oct 2008 CS 311 Fall 2008 19

Sorting Algorithms I, cont’dInsertion Sort — Do It

TO DO

• Implement Insertion Sort.

• Analyze, as before.

� See the next slide.

Done. See insertion_sort.cpp, on the web page.

10 Oct 2008 CS 311 Fall 2008 20

Sorting Algorithms I, cont’dInsertion Sort — Analysis

Efficiency �• Insertion Sort is O(n2).• Insertion Sort also has an average-case time of O(n2). �

Requirements on Data ☺• Insertion Sort does not require random-access data.• It works on Linked Lists*.

Space Usage ☺• Insertion Sort can be done in-place*.

Stability ☺• Insertion Sort is stable.

Performance on Nearly Sorted Data ☺• (1) Insertion Sort can be written to be O(n) if each item is at most some constant distance from its proper place*.

• (2) Insertion Sort can be written to be O(n) if only a constant number of items are out of place.

*The above are not quite true on one-way sequential-access data, including Linked Lists. We give up EITHER low additional storage OR linear time on type (1) nearly sorted data.

10 Oct 2008 CS 311 Fall 2008 21

Sorting Algorithms I, cont’dInsertion Sort — Comments

Insertion Sort is too slow for general-purpose use.

However, Insertion Sort is useful in certain special cases.

• Insertion Sort is fast (linear time) for nearly sorted data.

• Insertion Sort is also considered fast for small lists.

Insertion Sort often appears as part of another algorithm.

• Most good sorting methods call Insertion Sort for small lists.

• Some sorting methods get the data nearly sorted, and then finishwith an Insertion Sort. (More on this later.)

Implementation Issue

• For random-access data, Insertion Sort could use Binary Search. However, this is not the best method for nearly sorted data. Since that is when Insertion Sort is actually useful, we generally do the “find” step of Insertion Sort using a backwards Sequential Search.

� Note: For one-way sequential-access data, this method requires significant additional space (which need not hold data items).