Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer...

335
Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed as CS2984, this course replaces CS2606.

Transcript of Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer...

Page 1: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Coursenotes

CS3114: Data Structures andAlgorithms*

Clifford A. ShafferDepartment of Computer Science

Virginia TechCopyright © 2008

*Temporarily listed as CS2984, this course replaces CS2606.

Page 2: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Goals of this Course

1. Reinforce the concept that costs and benefits exist for every data structure.

2. Learn the commonly used data structures.– These form a programmer's basic data

structure ``toolkit.'‘

3. Understand how to measure the cost of a data structure or program.– These techniques also allow you to judge the

merits of new data structures that you or others might invent.

Page 3: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

The Need for Data Structures

Data structures organize data

more efficient programs.

More powerful computers

more complex applications.

More complex applications demand more calculations.

Complex computing tasks are unlike our everyday experience.

Page 4: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Organizing Data

Any organization for a collection of records can be searched, processed in any order, or modified.

The choice of data structure and algorithm can make the difference between a program running in a few seconds or many days.

Page 5: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Efficiency

A solution is said to be efficient if it solves the problem within its resource constraints.– Space– Time

• The cost of a solution is the amount of resources that the solution consumes.

Page 6: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Selecting a Data Structure

Select a data structure as follows:

1. Analyze the problem to determine the basic operations that must be supported.

2. Quantify the resource constraints for each operation.

3. Select the data structure that best meets these requirements.

Page 7: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Some Questions to Ask

• Are all data inserted into the data structure at the beginning, or are insertions interspersed with other operations?

• Can data be deleted?

• Are all data processed in some well-defined order, or is random access allowed?

Page 8: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Costs and Benefits

Each data structure has costs and benefits.

Rarely is one data structure better than another in all situations.

Any data structure requires:– space for each data item it stores,– time to perform each basic operation,– programming effort.

Page 9: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Costs and Benefits (cont)

Each problem has constraints on available space and time.

Only after a careful analysis of problem characteristics can we know the best data structure for the task.

Bank example:– Start account: a few minutes– Transactions: a few seconds– Close account: overnight

Page 10: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Example 1.2

Problem: Create a database containing information about cities and towns.

Tasks: Find by name or attribute or location

• Exact match, range query, spatial queryResource requirements: Times can be

from a few seconds for simple queries to a minute or two for complex queries

Page 11: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Scheduling

• Managing large-scale projects involves scheduling activities– It is human nature to work better toward

intermediate milestones.

• The same concepts can/should be applied to mid-sized projects encountered in class.– For any project that needs more than a week

of active work to complete, break into parts and design a schedule with milestones and deliverables.

Page 12: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Real Results #1

• CS2606, Fall 2006• 3-4 week projects• Kept schedule information:

– Estimated time required– Milestones, estimated times for each– Weekly estimates of time spent.

Page 13: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Real Results #2

Page 14: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Real Results #3

• Results were significant:– 90% of scores below median were students

who did less than 50% of the project prior to the last week.

– Few did poorly who put in > 50% time early– Some did well who didn’t put in >50% time

early, but most who did well put in the early time

Page 15: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Real Results #4

• Correlations:– Strong correlation between early time and

high score– No correlation between time spent and

score– No correlation between % early time and

total time

15

Page 16: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

What is the Mechanism?

• Correlations are not causal– Do they behave that way because they are good,

or does behaving that way make them good?

• Spreading projects over time allows the “sleep on it” heuristic to operate

• Avoiding the “zombie” effect makes people more productive (and cuts time requirements)

Page 17: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

17

Mathematical Background

Set concepts and notation

Logarithms

Recursion

Induction Proofs

Summations

Recurrence Relations

Page 18: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

18

Estimation Techniques

Known as “back of the envelope” or “back of the napkin” calculation

1. Determine the major parameters that effect the problem.

2. Derive an equation that relates the parameters to the problem.

3. Select values for the parameters, and apply the equation to yield and estimated solution.

Page 19: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

19

Estimation Example

How many library bookcases does it take to store books totaling one million pages?

Estimate:– Pages/inch– Feet/shelf– Shelves/bookcase

Page 20: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Abstract Data Types

Abstract Data Type (ADT): a definition for a data type solely in terms of a set of values and a set of operations on that data type.

Each ADT operation is defined by its inputs and outputs.

Encapsulation: Hide implementation details.

Page 21: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Data Structure

• A data structure is the physical implementation of an ADT.– Each operation associated with the ADT is

implemented by one or more subroutines in the implementation.

• Data structure usually refers to an organization for data in main memory.

• File structure: an organization for data on peripheral storage, such as a disk drive.

Page 22: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Metaphors

An ADT manages complexity through abstraction: metaphor.– Hierarchies of labels

Ex: transistors gates CPU.

In a program, implement an ADT, then think only about the ADT, not its implementation.

Page 23: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Logical vs. Physical Form

Data items have both a logical and a physical form.

Logical form: definition of the data item within an ADT.– Ex: Integers in mathematical sense: +, -

Physical form: implementation of the data item within a data structure.– Ex: 16/32 bit integers, overflow.

Page 24: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Data Type

ADT:TypeOperations

Data Items: Logical Form

Data Items: Physical Form

Data Structure:Storage SpaceSubroutines

Page 25: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Example 1.8

A typical database-style project will have many interacting parts.

Page 26: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

26

Algorithm Efficiency

There are often many approaches (algorithms) to solve a problem. How do we choose between them?

At the heart of computer program design are two (sometimes conflicting) goals.

1. To design an algorithm that is easy to understand, code, debug.

2. To design an algorithm that makes efficient use of the computer’s resources.

Page 27: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

27

Algorithm Efficiency (cont)

Goal (1) is the concern of Software Engineering.

Goal (2) is the concern of data structures and algorithm analysis.

When goal (2) is important, how do we measure an algorithm’s cost?

Page 28: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

28

How to Measure Efficiency?

1. Empirical comparison (run programs)2. Asymptotic Algorithm Analysis

Critical resources:

Factors affecting running time:

For most algorithms, running time depends on “size” of the input.

Running time is expressed as T(n) for some function T on input size n.

Page 29: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

29

Examples of Growth Rate

Example 1.

// Return position of largest value in "A"static int largest(int[] A) { int currlarge = 0; // Position of largest for (int i=1; i<A.length; i++) if (A[currlarge] < A[i]) currlarge = i; // Remember pos return currlarge; // Return largest pos}

Page 30: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

30

Examples (cont)

Example 2: Assignment statement.

Example 3:

sum = 0;for (i=1; i<=n; i++) for (j=1; j<n; j++) sum++;}

Page 31: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

31

Growth Rate Graph

Page 32: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

32

Best, Worst, Average Cases

Not all inputs of a given size take the same time to run.

Sequential search for K in an array of n integers:

• Begin at first element in array and look at each element in turn until K is found

Best case:

Worst case:

Average case:

Page 33: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

33

Which Analysis to Use?

While average time appears to be the fairest measure, it may be difficult to determine.

When is the worst case time important?

Page 34: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

34

Faster Computer or Algorithm?Suppose we buy a computer 10 times faster.n: size of input that can be processed in one second

on old computer (in 1000 computational units)n’: size of input that can be processed in one second

on new computer (in 10,000 computational units)

T(n) n n’ Change n’/n

10n 100 1,000 n’ = 10n 10

10n2 10 31.6 n’= 10n 3.16

10n 3 4 n’ = n + 1 1 + 1/n

Page 35: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

35

Asymptotic Analysis: Big-oh

Definition: For T(n) a non-negatively valued function, T(n) is in the set O(f(n)) if there exist two positive constants c and n0 such that T(n) <= cf(n) for all n > n0.

Use: The algorithm is in O(n2) in [best, average, worst] case.

Meaning: For all data sets big enough (i.e., n>n0), the algorithm always executes in less than cf(n) steps in [best, average, worst] case.

Page 36: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

36

Big-oh Notation (cont)

Big-oh notation indicates an upper bound.

Example: If T(n) = 3n2 then T(n) is in O(n2).

Look for the tightest upper bound:

While T(n) = 3n2 is in O(n3), we prefer O(n2).

Page 37: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

37

Big-Oh Examples

Example 1: Finding value X in an array (average cost).

Then T(n) = csn/2.

For all values of n > 1, csn/2 <= csn.

Therefore, the definition is satisfied for f(n)=n, n0 = 1, and c = cs.

Hence, T(n) is in O(n).

Page 38: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

38

Big-Oh Examples

Example 2: Suppose T(n) = c1n2 + c2n, where c1 and c2 are positive.

c1n2 + c2n <= c1n2 + c2n2 <= (c1 + c2)n2 for all n > 1.

Then T(n) <= cn2 whenever n > n0, for c = c1 + c2 and n0 = 1.

Therefore, T(n) is in O(n2) by definition.

Example 3: T(n) = c. Then T(n) is in O(1).

Page 39: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

39

A Common Misunderstanding

“The best case for my algorithm is n=1 because that is the fastest.” WRONG!

Big-oh refers to a growth rate as n grows to .

Best case is defined for the input of size n that is cheapest among all inputs of size n.

Page 40: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

40

Big-Omega

Definition: For T(n) a non-negatively valued function, T(n) is in the set (g(n)) if there exist two positive constants c and n0 such that T(n) >= cg(n) for all n > n0.

Meaning: For all data sets big enough (i.e., n > n0), the algorithm always requires more than cg(n) steps.

Lower bound.

Page 41: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

41

Big-Omega Example

T(n) = c1n2 + c2n.

c1n2 + c2n >= c1n2 for all n > 1.

T(n) >= cn2 for c = c1 and n0 = 1.

Therefore, T(n) is in (n2) by the definition.

We want the greatest lower bound.

Page 42: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

42

Theta Notation

When big-Oh and coincide, we indicate this by using (big-Theta) notation.

Definition: An algorithm is said to be in (h(n)) if it is in O(h(n)) and it is in (h(n)).

Page 43: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

43

A Common Misunderstanding

Confusing worst case with upper bound.

Upper bound refers to a growth rate.

Worst case refers to the worst input from among the choices for possible inputs of a given size.

Page 44: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

44

Simplifying Rules

1. If f(n) is in O(g(n)) and g(n) is in O(h(n)), then f(n) is in O(h(n)).

2. If f(n) is in O(kg(n)) for some constant k > 0, then f(n) is in O(g(n)).

3. If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then (f1 + f2)(n) is in O(max(g1(n), g2(n))).

4. If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)) then f1(n)f2(n) is in O(g1(n)g2(n)).

Page 45: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

45

Time Complexity Examples (1)

Example 3.9: a = b;

This assignment takes constant time, so it is (1).

Example 3.10:sum = 0;for (i=1; i<=n; i++) sum += n;

Page 46: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

46

Time Complexity Examples (2)

Example 3.11:sum = 0;for (j=1; j<=n; j++) for (i=1; i<=j; i++) sum++;for (k=0; k<n; k++) A[k] = k;

Page 47: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

47

Time Complexity Examples (3)

Example 3.12:sum1 = 0;for (i=1; i<=n; i++) for (j=1; j<=n; j++) sum1++;

sum2 = 0;for (i=1; i<=n; i++) for (j=1; j<=i; j++) sum2++;

Page 48: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

48

Time Complexity Examples (4)

Example 3.13:sum1 = 0;for (k=1; k<=n; k*=2) for (j=1; j<=n; j++) sum1++;

sum2 = 0;for (k=1; k<=n; k*=2) for (j=1; j<=k; j++) sum2++;

Page 49: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

49

Binary Search

How many elements are examined in worst case?

Page 50: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

50

Binary Search// Return the position of an element in "A"// with value "K". If "K" is not in "A",// return A.length.static int binary(int[] A, int K) { int l = -1; // Set l and r int r = A.length; // beyond array bounds while (l+1 != r) { // Stop when l, r meet int i = (l+r)/2; // Check middle if (K < A[i]) r = i; // In left half if (K == A[i]) return i; // Found it if (K > A[i]) l = i; // In right half } return A.length; // Search value not in A}

Page 51: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

51

Other Control Statements

while loop: Analyze like a for loop.

if statement: Take greater complexity of then/else clauses.

switch statement: Take complexity of most expensive case.

Subroutine call: Complexity of the subroutine.

Page 52: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

52

Problems

• Problem: a task to be performed.– Best thought of as inputs and matching

outputs.– Problem definition should include constraints

on the resources that may be consumed by any acceptable solution.

Page 53: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

53

Problems (cont)

• Problems mathematical functions– A function is a matching between inputs (the

domain) and outputs (the range).– An input to a function may be single number,

or a collection of information.– The values making up an input are called the

parameters of the function.– A particular input must always result in the

same output every time the function is computed.

Page 54: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

54

Algorithms and ProgramsAlgorithm: a method or a process followed to

solve a problem.– A recipe.

An algorithm takes the input to a problem (function) and transforms it to the output.– A mapping of input to output.

A problem can have many algorithms.

Page 55: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

55

Analyzing Problems

Upper bound: Upper bound of best known algorithm.

Lower bound: Lower bound for every possible algorithm.

Page 56: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

56

Space/Time Tradeoff Principle

One can often reduce time if one is willing to sacrifice space, or vice versa.

• Encoding or packing informationBoolean flags

• Table lookupFactorials

Disk-based Space/Time Tradeoff Principle: The smaller you make the disk storage requirements, the faster your program will run.

Page 57: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

57

Analyzing Problems: Example

May or may not be able to obtain matching upper and lower bounds.

Example of imperfect knowledge: Sorting

1. Cost of I/O: (n).

2. Bubble or insertion sort: O(n2).

3. A better sort (Quicksort, Mergesort, Heapsort, etc.): O(n log n).

4. We prove later that sorting is in (n log n).

Page 58: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

58

Multiple Parameters

Compute the rank ordering for all C pixel values in a picture of P pixels.

for (i=0; i<C; i++) // Initialize count count[i] = 0;for (i=0; i<P; i++) // Look at all pixels count[value(i)]++; // Increment countsort(count); // Sort pixel counts

If we use P as the measure, then time is (P log P).

More accurate is (P + C log C).

Page 59: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

59

Space Complexity

Space complexity can also be analyzed with asymptotic complexity analysis.

Time: AlgorithmSpace: Data Structure

Page 60: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

60

Lists

A list is a finite, ordered sequence of data items.

Important concept: List elements have a position.

Notation: <a0, a1, …, an-1>

What operations should we implement?

Page 61: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

61

List Implementation Concepts

Our list implementation will support the concept of a current position.

Operations will act relative to the current position.

<20, 23 | 12, 15>

Page 62: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

62

List ADTpublic interface List<E> { public void clear(); public void insert(E item); public void append(E item); public E remove(); public void moveToStart(); public void moveToEnd(); public void prev(); public void next(); public int length(); public int currPos(); public void moveToPos(int pos); public E getValue();}

Page 63: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

63

List ADT Examples

List: <12 | 32, 15>

L.insert(99);

Result: <12 | 99, 32, 15>

Iterate through the whole list:

for (L.moveToStart(); L.currPos()<L.length(); L.next()) { it = L.getValue(); doSomething(it);}

Page 64: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

64

List Find Function

/** @return True if "K" is in list "L", false otherwise */public static boolean find(List<Integer> L, int K) {

int it; for (L.moveToStart();

L.currPos()<L.length(); L.next()) { it = L.getValue(); if (K == it) return true; // Found K } return false; // K not found}

Page 65: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

65

Array-Based List Insert

Page 66: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

66

Array-Based List Class (1)class AList<E> implements List<E> { private static final int defaultSize = 10;

private int maxSize; private int listSize; private int curr; private E[] listArray;

// Constructors AList() { this(defaultSize); } @SuppressWarnings("unchecked") AList(int size) { maxSize = size; listSize = curr = 0; listArray = (E[])new Object[size]; }

Page 67: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

67

Array-Based List Class (2)

public void clear() { listSize = curr = 0; }public void moveToStart() { curr = 0; }public void moveToEnd() { curr = listSize; }public void prev() { if (curr != 0) curr--; }public void next() { if (curr < listSize) curr++; }public int length() { return listSize; }public int currPos() { return curr; }

Page 68: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

68

Array-Based List Class (3)

public void moveToPos(int pos) { assert (pos>=0) && (pos<=listSize) : "Position out of range";

curr = pos;}

public E getValue() { assert (curr >= 0) && (curr < listSize) : "No current element";

return listArray[curr];}

Page 69: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

69

Insert

// Insert "it" at current position */public void insert(E it) { assert listSize < maxSize : "List capacity exceeded"; for (int i=listSize; i>curr; i--) listArray[i] = listArray[i-1]; listArray[curr] = it; listSize++;}

Page 70: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

70

Appendpublic void append(E it) { // Append "it" assert listSize < maxSize : "List capacity exceeded"; listArray[listSize++] = it;}

Page 71: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

71

Remove

// Remove and return the current element.public E remove() { assert (curr >= 0) && (curr < listSize) :

"No current element"; E it = listArray[curr]; for(int i=curr; i<listSize-1; i++) listArray[i] = listArray[i+1]; listSize--; return it;}

Page 72: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

72

Link ClassDynamic allocation of new list elements.class Link<E> { private E element; private Link<E> next;

// Constructors Link(E it, Link<E> nextval) { element = it; next = nextval; } Link(Link<E> nextval) { next = nextval; }

Link<E> next() { return next; } Link<E> setNext(Link<E> nextval) { return next = nextval; } E element() { return element; } E setElement(E it) { return element = it; }}

Page 73: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

73

Linked List Position (1)

Page 74: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

74

Linked List Position (2)

Page 75: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

75

Linked List Class (1)

class LList<E> implements List<E> { private Link<E> head; private Link<E> tail; protected Link<E> curr; int cnt;

//Constructors LList(int size) { this(); } LList() { curr = tail = head = new Link<E>(null); cnt = 0; }

Page 76: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

76

Linked List Class (2)public void clear() { head.setNext(null); curr = tail = head = new Link<E>(null); cnt = 0;}public void moveToStart() { curr = head; }public void moveToEnd() { curr = tail; }public int length() { return cnt; }public void next() { if (curr != tail) { curr = curr.next(); }}public E getValue() { assert curr.next() != null : "Nothing to get"; return curr.next().element();}

Page 77: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

77

Insertion

Page 78: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

78

Insert/Append// Insert "it" at current positionpublic void insert(E it) { curr.setNext(new Link<E>(it, curr.next()));

if (tail == curr) tail = curr.next(); cnt++;}

public void append(E it) { tail = tail.setNext(new Link<E>(it, null)); cnt++;}

Page 79: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

79

Removal

Page 80: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

80

Remove/** Remove and return current element */public E remove() { if (curr.next() == null) return null; E it = curr.next().element(); if (tail == curr.next()) tail = curr; curr.setNext(curr.next().next()); cnt--; return it; }

Page 81: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

81

Prev/** Move curr one step left; no change if already at front */public void prev() { if (curr == head) return; Link<E> temp = head; // March down list until we find the // previous element while (temp.next() != curr) temp = temp.next(); curr = temp;}

Page 82: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

82

Get/Set Position/** Return position of the current element */public int currPos() { Link<E> temp = head; int i; for (i=0; curr != temp; i++) temp = temp.next(); return i;}

/** Move down list to "pos" position */public void moveToPos(int pos) { assert (pos>=0) && (pos<=cnt) : "Position out of range"; curr = head; for(int i=0; i<pos; i++) curr = curr.next();}

Page 83: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

83

Comparison of Implementations

Array-Based Lists:• Insertion and deletion are (n).• Prev and direct access are (1).• Array must be allocated in advance.• No overhead if all array positions are full.

Linked Lists:• Insertion and deletion are (1).• Prev and direct access are (n).• Space grows with number of elements.• Every element requires overhead.

Page 84: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

84

Space Comparison

“Break-even” point:

DE = n(P + E);

n = DE P + E

E: Space for data value.P: Space for pointer.D: Number of elements in array.

Page 85: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Space Example

• Array-based list: Overhead is one pointer (4 bytes) per position in array – whether used or not.

• Linked list: Overhead is two pointers per link node– one to the element, one to the next link

• Data is the same for both.• When is the space the same?

– When the array is half full

85

Page 86: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

86

Freelists

System new and garbage collection are slow.

• Add freelist support to the Link class.

Page 87: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Link Class Extensionsstatic Link freelist = null;

static <E> Link<E> get(E it, Link<E> nextval) { if (freelist == null) return new Link<E>(it, nextval); Link<E> temp = freelist; freelist = freelist.next(); temp.setElement(it); temp.setNext(nextval); return temp;}

void release() { // Return to freelist element = null; next = freelist; freelist = this;}

87

Page 88: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

88

Using Freelistpublic void insert(E it) { curr.setNext(Link.get(it, curr.next())); if (tail == curr) tail = curr.next(); cnt++;}

public E remove() { if (curr.next() == null) return null; E it = curr.next().element(); if (tail == curr.next()) tail = curr; Link<E> tempptr = curr.next(); curr.setNext(curr.next().next()); tempptr.release(); cnt--; return it; }

Page 89: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

89

Doubly Linked Listsclass DLink<E> { private E element; private DLink<E> next; private DLink<E> prev;

DLink(E it, DLink<E> n, DLink<E> p) { element = it; next = n; prev = p; } DLink(DLink<E> n, DLink<E> p) { next = n; prev = p; }

DLink<E> next() { return next; } DLink<E> setNext(DLink<E> nextval) { return next = nextval; } DLink<E> prev() { return prev; } DLink<E> setPrev(DLink<E> prevval) { return prev = prevval; } E element() { return element; } E setElement(E it) { return element = it; }}

Page 90: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

90

Doubly Linked Lists

Page 91: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

91

Doubly Linked Insert

Page 92: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

92

Doubly Linked Insert

public void insert(E it) { curr.setNext(new DLink<E>(it, curr.next(), curr));

if (curr.next().next() != null) curr.next().next().setPrev(curr.next());

if (tail == curr) tail = curr.next(); cnt++;}

Page 93: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

93

Doubly Linked Remove

Page 94: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

94

Doubly Linked Remove

public E remove() { if (curr.next() == null) return null; E it = curr.next().element(); if (curr.next().next() != null) curr.next().next().setPrev(curr); else tail = curr; curr.setNext(curr.next().next()); cnt--; return it; }

Page 95: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

95

Stacks

LIFO: Last In, First Out.

Restricted form of list: Insert and remove only at front of list.

Notation:• Insert: PUSH• Remove: POP• The accessible element is called TOP.

Page 96: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

96

Stack ADTpublic interface Stack<E> { /** Reinitialize the stack. */ public void clear();

/** Push an element onto the top of the stack. @param it Element being pushed onto the stack.*/ public void push(E it);

/** Remove and return top element. @return The element at the top of the stack.*/ public E pop();

/** @return A copy of the top element. */ public E topValue();

/** @return Number of elements in the stack. */ public int length();};

Page 97: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

97

Array-Based Stack

// Array-based stack implementationprivate int maxSize; // Max size of stackprivate int top; // Index for topprivate E [] listArray;

Issues:• Which end is the top?• Where does “top” point to?• What are the costs of the operations?

Page 98: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

98

Linked Stackclass LStack<E> implements Stack<E> { private Link<E> top; private int size;

What are the costs of the operations?

How do space requirements compare to the array-based stack implementation?

Page 99: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

99

Queues

FIFO: First in, First Out

Restricted form of list: Insert at one end, remove from the other.

Notation:• Insert: Enqueue• Delete: Dequeue• First element: Front• Last element: Rear

Page 100: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

100

Queue Implementation (1)

Page 101: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

101

Queue Implementation (2)

Page 102: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

102

Dictionary

Often want to insert records, delete records, search for records.

Required concepts:• Search key: Describe what we are looking

for• Key comparison

– Equality: sequential search– Relative order: sorting

Page 103: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Records and Keys

• Problem: How do we extract the key from a record?

• Records can have multiple keys.

• Fundamentally, the key is not a property of the record, but of the context.

• Solution: We will explicitly store the key with the record.

103

Page 104: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

104

Dictionary ADTpublic interface Dictionary<K, E> {

public void clear(); public void insert(K k, E e); public E remove(K k); // Null if none public E removeAny(); // Null if none public E find(K k); // Null if none public int size();};

Page 105: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Payroll Class// Simple payroll entry: ID, name, addressclass Payroll { private Integer ID; private String name; private String address;

Payroll(int inID, String inname, String inaddr) { ID = inID; name = inname; address = inaddr; }

public Integer getID() { return ID; } public String getname() { return name; } public String getaddr() { return address; }}

105

Page 106: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Using Dictionary// IDdict organizes Payroll records by IDDictionary<Integer, Payroll> IDdict = new UALdictionary<Integer, Payroll>();

// namedict organizes Payroll records by nameDictionary<String, Payroll> namedict = new UALdictionary<String, Payroll>();

Payroll foo1 = new Payroll(5, "Joe", "Anytown");Payroll foo2 = new Payroll(10, "John", "Mytown");IDdict.insert(foo1.getID(), foo1);IDdict.insert(foo2.getID(), foo2);namedict.insert(foo1.getname(), foo1);namedict.insert(foo2.getname(), foo2);Payroll findfoo1 = IDdict.find(5);Payroll findfoo2 = namedict.find("John");

106

Page 107: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

107

Unsorted List Dictionaryclass UALdictionary<K, E> implements Dictionary<K, E> {

private static final int defaultSize = 10; private AList<KVpair<K,E>> list;

// Constructors UALdictionary() { this(defaultSize); } UALdictionary(int sz) { list = new AList<KVpair<K, E>>(sz); }

public void clear() { list.clear(); }

/** Insert an element: append to list */ public void insert(K k, E e) { KVpair<K,E> temp = new KVpair<K,E>(k, e); list.append(temp); }

Page 108: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Sorted vs. Unsorted List Dictionaries

• If list were sorted– Could use binary search to speed search– Would need to insert in order, slowing

insert

• Which is better?– If lots of searches, sorted list is good– If inserts are as likely as searches, then

sorting is no benefit.

108

Page 109: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

109

Binary Trees

A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees, called the left and right subtrees, which are disjoint from each other and from the root.

Page 110: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

110

Binary Tree Example

Notation: Node, children, edge, parent, ancestor, descendant, path, depth, height, level, leaf node, internal node, subtree.

Page 111: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

111

Full and Complete Binary Trees

Full binary tree: Each node is either a leaf or internal node with exactly two non-empty children.

Complete binary tree: If the height of the tree is d, then all leaves except possibly level d are completely full. The bottom level has all nodes to the left side.

Page 112: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

112

Full Binary Tree Theorem (1)

Theorem: The number of leaves in a non-empty full binary tree is one more than the number of internal nodes.

Proof (by Mathematical Induction):

Base case: A full binary tree with 1 internal node must have two leaf nodes.

Induction Hypothesis: Assume any full binary tree T containing n-1 internal nodes has n leaves.

Page 113: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

113

Full Binary Tree Theorem (2)

Induction Step: Given tree T with n internal nodes, pick internal node I with two leaf children. Remove I’s children, call resulting tree T’.

By induction hypothesis, T’ is a full binary tree with n leaves.

Restore I’s two children. The number of internal nodes has now gone up by 1 to reach n. The number of leaves has also gone up by 1.

Page 114: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

114

Full Binary Tree Corollary

Theorem: The number of null pointers in a non-empty tree is one more than the number of nodes in the tree.

Proof: Replace all null pointers with a pointer to an empty leaf node. This is a full binary tree.

Page 115: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

115

Binary Tree Node Class/** ADT for binary tree nodes */public interface BinNode<E> { /** Return and set the element value */ public E element(); public E setElement(E v);

/** Return the left child */ public BinNode<E> left();

/** Return the right child */ public BinNode<E> right();

/** Return true if this is a leaf node */ public boolean isLeaf();}

Page 116: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

116

Traversals (1)

Any process for visiting the nodes in some order is called a traversal.

Any traversal that lists every node in the tree exactly once is called an enumeration of the tree’s nodes.

Page 117: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

117

Traversals (2)

• Preorder traversal: Visit each node before visiting its children.

• Postorder traversal: Visit each node after visiting its children.

• Inorder traversal: Visit the left subtree, then the node, then the right subtree.

Page 118: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

118

Traversals (3)/** @param rt The root of the subtree */void preorder(BinNode rt){ if (rt == null) return; // Empty subtree visit(rt); preorder(rt.left()); preorder(rt.right());}

void preorder2(BinNode rt) // Not so good{ visit(rt); if (rt.left() != null) preorder(rt.left()); if (rt.right() != null) preorder(rt.right());}

Page 119: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

119

Recursion Examplesint count(BinNode rt) { if (rt == null) return 0; return 1 + count(rt.left()) + count(rt.right());}

boolean checkBST(BSTNode<Integer,Integer> root, Integer low, Integer high) { if (root == null) return true; Integer rootkey = root.key(); if ((rootkey < low) || (rootkey > high)) return false; // Out of range if (!checkBST(root.left(), low, rootkey)) return false; // Left side failed return checkBST(root.right(), rootkey, high);}

Page 120: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

120

Binary Tree Implementation (1)

Page 121: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

121

Binary Tree Implementation (2)

Page 122: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

122

Inheritance (1)public interface VarBinNode { public boolean isLeaf();}

class VarLeafNode implements VarBinNode { private String operand; public VarLeafNode(String val) { operand = val; } public boolean isLeaf() { return true; } public String value() { return operand; }};

Page 123: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

123

Inheritance (2)

/** Internal node */class VarIntlNode implements VarBinNode { private VarBinNode left; private VarBinNode right; private Character operator;

public VarIntlNode(Character op, VarBinNode l, VarBinNode r) { operator = op; left = l; right = r; } public boolean isLeaf() { return false; } public VarBinNode leftchild() { return left; } public VarBinNode rightchild(){ return right; } public Character value() { return operator; }}

Page 124: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

124

Inheritance (3)

/** Preorder traversal */public static void traverse(VarBinNode rt) { if (rt == null) return; if (rt.isLeaf()) Visit.VisitLeafNode(((VarLeafNode)rt).value()); else { Visit.VisitInternalNode( ((VarIntlNode)rt).value()); traverse(((VarIntlNode)rt).leftchild()); traverse(((VarIntlNode)rt).rightchild()); }}

Page 125: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

125

Composition (1)

public interface VarBinNode { public boolean isLeaf(); public void traverse();}

class VarLeafNode implements VarBinNode { private String operand;

public VarLeafNode(String val) { operand = val; } public boolean isLeaf() { return true; } public String value() { return operand; }

public void traverse() { Visit.VisitLeafNode(operand); }}

Page 126: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

126

Composition (2)

class VarIntlNode implements VarBinNode { private VarBinNode left; private VarBinNode right; private Character operator;

public VarIntlNode(Character op, VarBinNode l, VarBinNode r) { operator = op; left = l; right = r; } public boolean isLeaf() { return false; } public VarBinNode leftchild() { return left; } public VarBinNode rightchild() { return right; } public Character value() { return operator; }

public void traverse() { Visit.VisitInternalNode(operator); if (left != null) left.traverse(); if (right != null) right.traverse(); }}

Page 127: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

127

Composition (3)

/** Preorder traversal */public static void traverse(VarBinNode rt) { if (rt != null) rt.traverse();}

Page 128: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

128

Space Overhead (1)

From the Full Binary Tree Theorem:• Half of the pointers are null.

If leaves store only data, then overhead depends on whether the tree is full.

Ex: Full tree, all nodes the same, with two pointers to children and one to element:

• Total space required is (3p + d)n• Overhead: 3pn• If p = d, this means 3p/(3p + d) = 3/4 overhead.

Page 129: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

129

Space Overhead (2)

Eliminate pointers from the leaf nodes:

n/2(2p) + np 2p n/2(2p) + np +dn 2p + d

This is 2/3 if p = d.

(2p +p)/(2p + d + p) if data only at leaves 3/4 overhead.

Note that some method is needed to distinguish leaves from internal nodes.

=

Page 130: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

130

Array Implementation (1)

Position 0 1 2 3 4 5 6 7 8 9 10 11

Parent -- 0 0 1 1 2 2 3 3 4 4 5

Left Child 1 3 5 7 9 11 -- -- -- -- -- --

Right Child 2 4 6 8 10 -- -- -- -- -- -- --

Left Sibling -- -- 1 -- 3 -- 5 -- 7 -- 9 --

Right Sibling -- 2 -- 4 -- 6 -- 8 -- 10 -- --

Page 131: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

131

Array Implementation (1)

Parent (r) =

Leftchild(r) =

Rightchild(r) =

Leftsibling(r) =

Rightsibling(r) =

Page 132: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

132

Binary Search Trees

BST Property: All elements stored in the left subtree of a node with value K have values < K. All elements stored in the right subtree of a node with value K have values >= K.

Page 133: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

133

BSTNode (1)class BSTNode<K,E> implements BinNode<E> { private K key; private E element; private BSTNode<K,E> left; private BSTNode<K,E> right;

public BSTNode() {left = right = null; } public BSTNode(K k, E val) { left = right = null; key = k; element = val; } public BSTNode(K k, E val, BSTNode<K,E> l, BSTNode<K,E> r) { left = l; right = r; key = k; element = val; }

public K key() { return key; } public K setKey(K k) { return key = k; }

public E element() { return element; } public E setElement(E v) { return element = v; }

Page 134: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

134

BSTNode (2) public BSTNode<K,E> left() { return left; } public BSTNode<K,E> setLeft(BSTNode<K,E> p) { return left = p; }

public BSTNode<K,E> right() { return right; } public BSTNode<K,E> setRight(BSTNode<K,E> p) { return right = p; }

public boolean isLeaf() { return (left == null) && (right == null); }}

Page 135: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

135

BST (1)/** BST implementation for Dictionary ADT */class BST<K extends Comparable<? super K>, E> implements Dictionary<K, E> { private BSTNode<K,E> root; // Root of BST int nodecount; // Size of BST

/** Constructor */ BST() { root = null; nodecount = 0; }

/** Reinitialize tree */ public void clear() { root = null; nodecount = 0; }

/** Insert a record into the tree. @param k Key value of the record. @param e The record to insert. */ public void insert(K k, E e) { root = inserthelp(root, k, e); nodecount++; }

Page 136: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

136

BST (2) /** Remove a record from the tree. @param k Key value of record to remove. @return Record removed, or null if there is none. */ public E remove(K k) { E temp = findhelp(root, k); // find it if (temp != null) { root = removehelp(root, k); // remove it nodecount--; } return temp; }

Page 137: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

137

BST (3) /** Remove/return root node from dictionary. @return The record removed, null if empty. */ public E removeAny() { if (root != null) { E temp = root.element(); root = removehelp(root, root.key()); nodecount--; return temp; } else return null; }

/** @return Record with key k, null if none. @param k The key value to find. */ public E find(K k) { return findhelp(root, k); }

/** @return Number of records in dictionary. */ public int size() { return nodecount; }}

Page 138: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

138

BST Searchprivate E findhelp(BSTNode<K,E> rt, K k) { if (rt == null) return null; if (rt.key().compareTo(k) > 0) return findhelp(rt.left(), k); else if (rt.key().compareTo(k) == 0) return rt.element(); else return findhelp(rt.right(), k);}

Page 139: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

139

BST Insert (1)

Page 140: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

140

BST Insert (2)private BSTNode<K,E> inserthelp(BSTNode<K,E> rt, K k, E e) { if (rt == null) return new BSTNode<K,E>(k, e); if (rt.key().compareTo(k) > 0) rt.setLeft(inserthelp(rt.left(), k, e)); else rt.setRight(inserthelp(rt.right(), k, e)); return rt;}

Page 141: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

141

Get/Remove Minimum Valueprivate BSTNode<K,E> getmin(BSTNode<K,E> rt) { if (rt.left() == null) return rt; else return getmin(rt.left());}

private BSTNode<K,E> deletemin(BSTNode<K,E> rt) { if (rt.left() == null) return rt.right(); else { rt.setLeft(deletemin(rt.left())); return rt; }}

Page 142: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

142

BST Remove (1)

Page 143: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

143

BST Remove (2)/** Remove a node with key value k @return The tree with the node removed */private BSTNode<K,E> removehelp(BSTNode<K,E> rt, K k) { if (rt == null) return null; if (rt.key().compareTo(k) > 0) rt.setLeft(removehelp(rt.left(), k)); else if (rt.key().compareTo(k) < 0) rt.setRight(removehelp(rt.right(), k));

Page 144: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

144

BST Remove (3) else { // Found it, remove it if (rt.left() == null) return rt.right(); else if (rt.right() == null) return rt.left(); else { // Two children BSTNode<K,E> temp = getmin(rt.right()); rt.setElement(temp.element()); rt.setKey(temp.key()); rt.setRight(deletemin(rt.right())); } } return rt;}

Page 145: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

145

Time Complexity of BST Operations

Find: O(d)

Insert: O(d)

Delete: O(d)

d = depth of the tree

d is O(log n) if tree is balanced. What is the worst case?

Page 146: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

146

Priority Queues (1)

Problem: We want a data structure that stores records as they come (insert), but on request, releases the record with the greatest value (removemax)

Example: Scheduling jobs in a multi-tasking operating system.

Page 147: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

147

Priority Queues (2)

Possible Solutions:- insert appends to an array or a linked list ( O(1) )

and then removemax determines the maximum by scanning the list ( O(n) )

- A linked list is used and is in decreasing order; insert places an element in its correct position ( O(n) ) and removemax simply removes the head of the list( O(1) ).

- Use a heap – both insert and removemax areO( log n ) operations

Page 148: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

148

Heaps

Heap: Complete binary tree with the heap property:

• Min-heap: All values less than child values.• Max-heap: All values greater than child values.

The values are partially ordered.

Heap representation: Normally the array-based complete binary tree representation.

Page 149: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

149

Max Heap Example

88 85 83 72 73 42 57 6 48 60

Page 150: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

150

Max Heap Implementation (1)public class MaxHeap<K extends Comparable<? super K>, E> { private KVpair<K,E>[] Heap; // Pointer to heap array private int size; // Maximum size of heap private int n; // # of things in heap

public MaxHeap(KVpair<K,E>[] h, int num, int max){ Heap = h; n = num; size = max; buildheap(); }

public int heapsize() { return n; }

public boolean isLeaf(int pos) // Is pos a leaf position?{ return (pos >= n/2) && (pos < n); }

public int leftchild(int pos) { // Leftchild position assert pos < n/2 : "Position has no left child"; return 2*pos + 1;}

public int rightchild(int pos) { // Rightchild position assert pos < (n-1)/2 : "Position has no right child"; return 2*pos + 2;}public int parent(int pos) { assert pos > 0 : "Position has no parent"; return (pos-1)/2;}

Page 151: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

151

Sift Downpublic void buildheap() // Heapify contents { for (int i=n/2-1; i>=0; i--) siftdown(i); }

private void siftdown(int pos) { assert (pos >= 0) && (pos < n) : "Illegal heap position"; while (!isLeaf(pos)) { int j = leftchild(pos); if ((j<(n-1)) && (Heap[j].key().compareTo(Heap[j+1].key()) < 0)) j++; // index of child w/ greater value if (Heap[pos].key().compareTo(Heap[j].key()) >= 0) return; DSutil.swap(Heap, pos, j); pos = j; // Move down }}

Page 152: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

152

RemoveMax, Insert

public KVpair<K,E> removemax() { assert n > 0 : "Removing from empty heap"; DSutil.swap(Heap, 0, --n); if (n != 0) siftdown(0); return Heap[n];}

public void insert(KVpair<K,E> val) { assert n < size : "Heap is full"; int curr = n++; Heap[curr] = val; // Siftup until curr parent's key > curr key while ((curr != 0) && (Heap[curr].key(). compareTo(Heap[parent(curr)].key()) > 0)) { DSutil.swap(Heap, curr, parent(curr)); curr = parent(curr); }}

Page 153: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Heap Building Analysis

• Insert into the heap one value at a time: – Push each new value down the tree from the root

to where it belongs– log i = (n log n)

• Starting with full array, work from bottom up– Since nodes below form a heap, just need to push

current node down (at worst, go to bottom)– Most nodes are at the bottom, so not far to go– (i-1) n/2i = (n)

153

Page 154: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

154

Huffman Coding Trees

ASCII codes: 8 bits per character.• Fixed-length coding.

Can take advantage of relative frequency of letters to save space.

• Variable-length coding

Build the tree with minimum external path weight.

Z K M C U D L E

2 7 24 32 37 42 42 120

Page 155: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

155

Huffman Tree Construction (1)

Page 156: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

156

Huffman Tree Construction (2)

Page 157: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

157

Assigning Codes

Letter Freq Code Bits

C 32

D 42

E 120

M 24

K 7

L 42

U 37

Z 2

Page 158: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

158

Coding and Decoding

A set of codes is said to meet the prefix property if no code in the set is the prefix of another.

Code for DEED:

Decode 1011001110111101:

Expected cost per letter:

Page 159: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

159

General Trees

Page 160: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

160

General Tree Nodeinterface GTNode<E> { public E value(); public boolean isLeaf(); public GTNode<E> parent(); public GTNode<E> leftmostChild(); public GTNode<E> rightSibling(); public void setValue(E value); public void setParent(GTNode<E> par); public void insertFirst(GTNode<E> n); public void insertNext(GTNode<E> n); public void removeFirst(); public void removeNext();}

Page 161: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

161

General Tree Traversal/** Preorder traversal for general trees */static <E> void preorder(GTNode<E> rt) { PrintNode(rt); if (!rt.isLeaf()) { GTNode<E> temp = rt.leftmostChild(); while (temp != null) { preorder(temp); temp = temp.rightSibling(); } }}

Page 162: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Parent Pointer Implementation

Page 163: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Equivalence Class Problem

The parent pointer representation is good for answering:

– Are two elements in the same tree?

/** Determine if nodes in different trees */public boolean differ(int a, int b) { Integer root1 = FIND(array[a]); Integer root2 = FIND(array[b]); return root1 != root2;}

Page 164: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Union/Find/** Merge two subtrees */public void UNION(int a, int b) { Integer root1 = FIND(a); // Find a’s root Integer root2 = FIND(b); // Find b’s root if (root1 != root2) array[root2] = root1;}

public Integer FIND(Integer curr) { if (array[curr] == null) return curr; while (array[curr] != null) curr = array[curr]; return curr;}

Want to keep the depth small.

Weighted union rule: Join the tree with fewer nodes to the tree with more nodes.

Page 165: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Equiv Class Processing (1)

Page 166: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Equiv Class Processing (2)

Page 167: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Path Compressionpublic Integer FIND(Integer curr) { if (array[curr] == null) return curr; array[curr] = FIND(array[curr]); return array[curr];}

Page 168: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

168

Lists of Children

Page 169: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

169

Leftmost Child/Right Sibling (1)

Page 170: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

170

Leftmost Child/Right Sibling (2)

Page 171: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

171

Linked Implementations (1)

Page 172: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

172

Linked Implementations (2)

Page 173: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Efficient Linked Implementation

173

Page 174: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

174

Sequential Implementations (1)

List node values in the order they would be visited by a preorder traversal.

Saves space, but allows only sequential access.

Need to retain tree structure for reconstruction.

Example: For binary trees, us a symbol to mark null links.

AB/D//CEG///FH//I//

Page 175: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

175

Sequential Implementations (2)

Example: For general trees, mark the end of each subtree.

RAC)D)E))BF)))

Page 176: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Sorting

Each record contains a field called the key.– Linear order: comparison.

Measures of cost:– Comparisons– Swaps

Page 177: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Insertion Sort (1)

Page 178: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Insertion Sort (2)static <E extends Comparable<? super E>>void Sort(E[] A) { for (int i=1; i<A.length; i) for (int j=i; (j>0) && (A[j].compareTo(A[j-1])<0); j--) DSutil.swap(A, j, j-1);}

Best Case:Worst Case:Average Case:

Page 179: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Bubble Sort (1)

Page 180: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Bubble Sort (2)static <E extends Comparable<? super E>> void Sort(E[] A) { for (int i=0; i<A.length-1; i++) for (int j=A.length-1; j>i; j--) if ((A[j].compareTo(A[j-1]) < 0)) DSutil.swap(A, j, j-1);}

Best Case:Worst Case:Average Case:

Page 181: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Selection Sort (1)

Page 182: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Selection Sort (2)static <E extends Comparable<? super E>>void Sort(E[] A) { for (int i=0; i<A.length-1; i++) { int lowindex = i; for (int j=A.length-1; j>i; j--) if (A[j].compareTo(A[lowindex]) < 0) lowindex = j; DSutil.swap(A, i, lowindex); }}

Best Case:Worst Case:Average Case:

Page 183: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Pointer Swapping

Page 184: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Summary

Insertion Bubble Selection

Comparisons:

Best Case (n) (n2) (n2)Average Case (n2) (n2) (n2)Worst Case (n2) (n2) (n2)

SwapsBest Case 0 0 (n)Average Case (n2) (n2) (n)Worst Case (n2) (n2) (n)

Page 185: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Exchange Sorting

All of the sorts so far rely on exchanges of adjacent records.

What is the average number of exchanges required?– There are n! permutations– Consider permuation X and its reverse, X’– Together, every pair requires n(n-1)/2

exchanges.

Page 186: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Shellsort

Page 187: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Shellsortstatic <E extends Comparable<? super E>> void Sort(E[] A) { for (int i=A.length/2; i>2; i/=2) for (int j=0; j<i; j++) inssort2(A, j, i); inssort2(A, 0, 1);}

/** Modified version of Insertion Sort for varying increments */static <E extends Comparable<? super E>>void inssort2(E[] A, int start, int incr) { for (int i=start+incr; i<A.length; i+=incr) for (int j=i;(j>=incr)&& (A[j].compareTo(A[j-incr])<0); j-=incr) DSutil.swap(A, j, j-incr);}

Page 188: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Quicksortstatic <E extends Comparable<? super E>>void qsort(E[] A, int i, int j) { int pivotindex = findpivot(A, i, j); DSutil.swap(A, pivotindex, j); // k will be first position in right subarray int k = partition(A, i-1, j, A[j]); DSutil.swap(A, k, j); if ((k-i) > 1) qsort(A, i, k-1); if ((j-k) > 1) qsort(A, k+1, j);}

static <E extends Comparable<? super E>>int findpivot(E[] A, int i, int j) { return (i+j)/2; }

Page 189: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Quicksort Partitionstatic <E extends Comparable<? super E>>int partition(E[] A, int l, int r, E pivot) { do { // Move bounds inward until they meet while (A[++l].compareTo(pivot)<0); while ((r!=0) && (A[--r].compareTo(pivot)>0)); DSutil.swap(A, l, r); } while (l < r); DSutil.swap(A, l, r); return l;}

The cost for partition is (n).

Page 190: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Partition Example

Page 191: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Quicksort Example

Page 192: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Cost of Quicksort

Best case: Always partition in half.Worst case: Bad partition.Average case:

T(n) = n + 1 + 1/(n-1) (T(k) + T(n-k))

Optimizations for Quicksort:– Better Pivot– Better algorithm for small sublists– Eliminate recursion

k=1

n-1

Page 193: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

MergesortList mergesort(List inlist) { if (inlist.length() <= 1)return inlist; List l1 = half of the items from inlist; List l2 = other half of items from inlist;

return merge(mergesort(l1), mergesort(l2));}

Page 194: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Mergesort Implementationstatic <E extends Comparable<? super E>>void mergesort(E[] A, E[] temp, int l, int r) { int mid = (l+r)/2; if (l == r) return; mergesort(A, temp, l, mid); mergesort(A, temp, mid+1, r); for (int i=l; i<=r; i++) // Copy subarray temp[i] = A[i]; // Do the merge operation back to A int i1 = l; int i2 = mid + 1; for (int curr=l; curr<=r; curr++) { if (i1 == mid+1) // Left sublist exhausted A[curr] = temp[i2++]; else if (i2 > r) // Right sublist exhausted A[curr] = temp[i1++]; else if (temp[i1].compareTo(temp[i2])<0) A[curr] = temp[i1++]; else A[curr] = temp[i2++]; }}

Page 195: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Optimized Mergesortvoid mergesort(E[] A, E[] temp, int l, int r) { int i, j, k, mid = (l+r)/2; if (l == r) return; // List has one element if ((mid-l) >= THRESHOLD) mergesort(A, temp, l, mid); else inssort(A, l, mid-l+1); if ((r-mid) > THRESHOLD) mergesort(A, temp, mid+1, r); else inssort(A, mid+1, r-mid); // Do merge. First, copy 2 halves to temp. for (i=l; i<=mid; i++) temp[i] = A[i]; for (j=1; j<=r-mid; j++) temp[r-j+1] = A[j+mid]; // Merge sublists back to array for (i=l,j=r,k=l; k<=r; k++) if (temp[i].compareTo(temp[j])<0) A[k] = temp[i++]; else A[k] = temp[j--];}

Page 196: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Mergesort Cost

Mergesort cost:

Mergsort is also good for sorting linked lists.

Mergesort requires twice the space.

Page 197: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Heapsortstatic <E extends Comparable<? super E>>void heapsort(E[] A) { // Heapsort MaxHeap<E> H = new MaxHeap<E>(A, A.length, A.length); for (int i=0; i<A.length; i++) // Now sort H.removemax(); // Put max at end of heap}

Use a max-heap, so that elements end up sorted within the array.

Cost of heapsort:

Cost of finding K largest elements:

Page 198: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Heapsort Example (1)

Page 199: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Heapsort Example (2)

Page 200: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Binsort (1)

A simple, efficient sort:for (i=0; i<n; i++) B[A[i]] = A[i];

Ways to generalize:– Make each bin the head of a list.– Allow more keys than records.

Page 201: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Binsort (2)static void binsort(Integer A[]) { List<Integer>[] B = (LList<Integer>[])new LList[MaxKey]; Integer item; for (int i=0; i<MaxKey; i++) B[i] = new LList<Integer>(); for (int i=0; i<A.length; i++) B[A[i]].append(A[i]); for (int i=0; i<MaxKey; i++) for (B[i].moveToStart(); (item = B[i].getValue()) != null; B[i].next()) output(item);}

Cost:

Page 202: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Radix Sort (1)

Page 203: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Radix Sort (2)static void radix(Integer[] A, Integer[] B, int k, int r, int[] count) { int i, j, rtok;

for (i=0, rtok=1; i<k; i++, rtok*=r) { for (j=0; j<r; j++) count[j] = 0; // Count # of recs for each bin on this pass for (j=0; j<A.length; j++) count[(A[j]/rtok)%r]++; // count[j] is index in B for last slot of j for (j=1; j<r; j++) count[j] = count[j-1] + count[j]; for (j=A.length-1; j>=0; j--) B[--count[(A[j]/rtok)%r]] = A[j]; for (j=0; j<A.length; j++) A[j] = B[j]; }}

Page 204: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Radix Sort Example

Page 205: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Radix Sort Cost

Cost: (nk + rk)

How do n, k, and r relate?

If key range is small, then this can be (n).

If there are n distinct keys, then the length of a key must be at least log n.– Thus, Radix Sort is (n log n) in general case

Page 206: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Empirical ComparisonSort 10 100 1K 10K 100K 1M Up Down

Insertion .00023 .007 0.66 64.98 7281.0 674420 0.04 129.05

Bubble .00035 .020 2.25 277.94 27691.0 2820680 70.64 108.69

Selection .00039 .012 0.69 72.47 7356.0 780000 69.76 69.58

Shell .00034 .008 0.14 1.99 30.2 554 0.44 0.79

Shell/O .00034 .008 0.12 1.91 29.0 530 0.36 0.64

Merge .00050 .010 0.12 1.61 19.3 219 0.83 0.79

Merge/O .00024 .007 0.10 1.31 17.2 197 0.47 0.66

Quick .00048 .008 0.11 1.37 15.7 162 0.37 0.40

Quick/O .00031 .006 0.09 1.14 13.6 143 0.32 0.36

Heap .00050 .011 0.16 2.08 26.7 391 1.57 1.56

Heap/O .00033 .007 0.11 1.61 20.8 334 1.01 1.04

Radix/4 .00838 .081 0.79 7.99 79.9 808 7.97 7.97

Radix/8 .00799 .044 0.40 3.99 40.0 404 4.00 3.99

Page 207: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Sorting Lower Bound

We would like to know a lower bound for all possible sorting algorithms.

Sorting is O(n log n) (average, worst cases) because we know of algorithms with this upper bound.

Sorting I/O takes (n) time.

We will now prove (n log n) lower bound for sorting.

Page 208: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Decision Trees

Page 209: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Lower Bound Proof

• There are n! permutations.• A sorting algorithm can be viewed as

determining which permutation has been input.• Each leaf node of the decision tree corresponds

to one permutation.• A tree with n nodes has (log n) levels, so the

tree with n! leaves has (log n!) = (n log n) levels.

Which node in the decision tree corresponds to the worst case?

Page 210: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Primary vs. Secondary Storage

Primary storage: Main memory (RAM)

Secondary Storage: Peripheral devices– Disk drives– Tape drives– Flash drives

Page 211: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Comparisons

RAM is usually volatile.

RAM is about 1/2 million times faster than disk.

Medium 1996 1997 2000 20004 2006 2007 2008

RAM $45.00 7.00 1.500 0.3500 0.1500 0.0742 0.0339

Disk 0.25 0.10 0.010 0.0010 0.0005 0.0004 0.0001

Flash ----- ----- ----- 0.1000 0.0900 0.0098 0.0029

Floppy 0.50 0.36 0.250 0.2500 ----- ----- -----

Tape 0.03 0.01 0.001 0.0003 ----- ----- -----

Page 212: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Golden Rule of File Processing

Minimize the number of disk accesses!1. Arrange information so that you get what you want

with few disk accesses.2. Arrange information to minimize future disk accesses.

An organization for data on disk is often called a file structure.

Disk-based space/time tradeoff: Compress information to save processing time by reducing disk accesses.

Page 213: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Disk Drives

Page 214: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Sectors

A sector is the basic unit of I/O.

Page 215: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Terms

Locality of Reference: When record is read from disk, next request is likely to come from near the same place on the disk.

Cluster: Smallest unit of file allocation, usually several sectors.

Extent: A group of physically contiguous clusters.

Internal fragmentation: Wasted space within sector if record size does not match sector size; wasted space within cluster if file size is not a multiple of cluster size.

Page 216: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Seek Time

Seek time: Time for I/O head to reach desired track. Largely determined by distance between I/O head and desired track.

Track-to-track time: Minimum time to move from one track to an adjacent track.

Average Access time: Average time to reach a track for random access.

Page 217: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Other Factors

Rotational Delay or Latency: Time for data to rotate under I/O head.

– One half of a rotation on average.– At 7200 rpm, this is 8.3/2 = 4.2ms.

Transfer time: Time for data to move under the I/O head.

– At 7200 rmp: Number of sectors read/Number of sectors per track * 8.3ms.

Page 218: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Disk Spec Example

16.8 GB disk on 10 platters = 1.68GB/platter13,085 tracks/platter256 sectors/track512 bytes/sectorTrack-to-track seek time: 2.2 msAverage seek time: 9.5ms4KB clusters, 32 clusters/track.5400RPM

Page 219: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Disk Access Cost Example (1)

Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each.

Assume all records are on 8 contiguous tracks.

First track: 9.5 + (11.1)(1.5) = 26.2 ms

Remaining 7 tracks: 2.2 + (11.1)(1.5) = 18.9ms.

Total: 26.2 + 7 * 18.9 = 158.5ms

Page 220: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Disk Access Cost Example (2)

Read a 1MB file divided into 2048 records of 512 bytes (1 sector) each.

Assume all file clusters are randomly spread across the disk.

256 clusters. Cluster read time is 8/256 of a rotation for about 5.9ms for

both latency and read time.

256(9.5 + 5.9) is about 3942ms or nearly 4 sec.

Page 221: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

How Much to Read?

Read time for one track:9.5 + (11.1)(1.5) = 26.2ms

Read time for one sector:9.5 + 11.1/2 + (1/256)11.1 = 15.1ms

Read time for one byte:9.5 + 11.1/2 = 15.05ms

Nearly all disk drives read/write one sector (or more) at every I/O access

– Also referred to as a page or block

Page 222: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Recent Drive Specs

• Samsung Spinpoint T166

• 500GB (nominal)

• 7200 RPM

• Track to track: 0.8 ms

• Average track access: 8.9 ms

• Bytes/sector 512

• 6 surfaces/heads222

Page 223: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Buffers

The information in a sector is stored in a buffer or cache.

If the next I/O access is to the same buffer, then no need to go to disk.

There are usually one or more input buffers and one or more output buffers.

Page 224: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Buffer Pools

A series of buffers used by an application to cache disk data is called a buffer pool.

Virtual memory uses a buffer pool to imitate greater RAM memory by actually storing information on disk and “swapping” between disk and RAM.

Page 225: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Buffer Pools

Page 226: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Organizing Buffer Pools

Which buffer should be replaced when new data must be read?

First-in, First-out: Use the first one on the queue.

Least Frequently Used (LFU): Count buffer accesses, reuse the least used.

Least Recently used (LRU): Keep buffers on a linked list. When buffer is accessed, bring it to front. Reuse the one at end.

Page 227: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Bufferpool ADT: Message Passing

/** Buffer pool: message-passing style */public interface BufferPoolADT { /** Copy "sz" bytes from "space" to position "pos" in the buffered storage */ public void insert(byte[] space, int sz, int pos); /** Copy "sz" bytes from position "pos" of the buffered storage to "space". */ public void getbytes(byte[] space, int sz, int pos);}

Page 228: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Bufferpool ADT: Buffer Passing

/** Buffer pool: buffer-passing style */public interface BufferPoolADT { /** Return pointer to requested block */ public byte[] getblock(int block);

/** Set the dirty bit for the buffer holding "block" */ public void dirtyblock(int block);

/** Tell the size of a buffer */ public int blocksize();}

228

Page 229: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Design Issues

Disadvantage of message passing:• Messages are copied and passed back and

forth. Disadvantages of buffer passing:• The user is given access to system memory (the

buffer itself)• The user must explicitly tell the buffer pool when

buffer contents have been modified, so that modified data can be rewritten to disk when the buffer is flushed.

• The pointer might become stale when the bufferpool replaces the contents of a buffer.

Page 230: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Some Goals

• Be able to avoid reading data when the block contents will be replaced.

• Be able to support multiple users accessing a buffer, and indpendantly releasing a buffer.

• Don’t make an active buffer stale.

230

Page 231: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Improved Interface

public interface BufferPoolADT { Buffer acquireBuffer(int block);}

public interface BufferADT { // Read the block from disk public byte[] readBlock(); // Just get pointer to space, no read public byte[] getDataPointer(); // Contents have changed public void markDirty(); // Release access to the block public void release();}

231

Page 232: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Programmer’s View of Files

Logical view of files:– An a array of bytes.– A file pointer marks the current position.

Three fundamental operations:– Read bytes from current position (move file

pointer)– Write bytes to current position (move file

pointer)– Set file pointer to specified byte position.

Page 233: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Java File Functions

RandomAccessFile(String name, String mode)

close()

read(byte[] b)

write(byte[] b)

seek(long pos)

Page 234: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

External Sorting

Problem: Sorting data sets too large to fit into main memory.

– Assume data are stored on disk drive.

To sort, portions of the data must be brought into main memory, processed, and returned to disk.

An external sort should minimize disk accesses.

Page 235: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Model of External Computation

Secondary memory is divided into equal-sized blocks (512, 1024, etc…)

A basic I/O operation transfers the contents of one disk block to/from main memory.

Under certain circumstances, reading blocks of a file in sequential order is more efficient. (When?)

Primary goal is to minimize I/O operations.

Assume only one disk drive is available.

Page 236: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Key Sorting

Often, records are large, keys are small.– Ex: Payroll entries keyed on ID number

Approach 1: Read in entire records, sort them, then write them out again.

Approach 2: Read only the key values, store with each key the location on disk of its associated record.

After keys are sorted the records can be read and rewritten in sorted order.

Page 237: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Simple External Mergesort (1)

Quicksort requires random access to the entire set of records.

Better: Modified Mergesort algorithm.– Process n elements in (log n) passes.

A group of sorted records is called a run.

Page 238: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Simple External Mergesort (2)

• Split the file into two files.• Read in a block from each file.• Take first record from each block, output them in

sorted order.• Take next record from each block, output them

to a second file in sorted order.• Repeat until finished, alternating between output

files. Read new input blocks as needed.• Repeat steps 2-5, except this time input files

have runs of two sorted records that are merged together.

• Each pass through the files provides larger runs.

Page 239: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Simple External Mergesort (3)

Page 240: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Problems with Simple Mergesort

Is each pass through input and output files sequential?

What happens if all work is done on a single disk drive?

How can we reduce the number of Mergesort passes?

In general, external sorting consists of two phases:– Break the files into initial runs– Merge the runs together into a single run.

Page 241: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Breaking a File into Runs

General approach:– Read as much of the file into memory as

possible.– Perform an in-memory sort.– Output this group of records as a single run.

Page 242: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Replacement Selection (1)

• Break available memory into an array for the heap, an input buffer, and an output buffer.

• Fill the array from disk.• Make a min-heap.• Send the smallest value (root) to the

output buffer.

Page 243: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Replacement Selection (2)

• If the next key in the file is greater than the last value output, then

– Replace the root with this keyelse– Replace the root with the last key in the

arrayAdd the next record in the file to a new heap

(actually, stick it at the end of the array).

Page 244: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

RS Example

Page 245: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Snowplow Analogy (1)

Imagine a snowplow moving around a circular track on which snow falls at a steady rate.

At any instant, there is a certain amount of snow S on the track. Some falling snow comes in front of the plow, some behind.

During the next revolution of the plow, all of this is removed, plus 1/2 of what falls during that revolution.

Thus, the plow removes 2S amount of snow.

Page 246: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Snowplow Analogy (2)

Page 247: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Problems with Simple Merge

Simple mergesort: Place runs into two files.– Merge the first two runs to output file, then

next two runs, etc.

Repeat process until only one run remains.– How many passes for r initial runs?

Is there benefit from sequential reading?Is working memory well used?Need a way to reduce the number of

passes.

Page 248: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Multiway Merge (1)

With replacement selection, each initial run is several blocks long.

Assume each run is placed in separate file.

Read the first block from each file into memory and perform an r-way merge.

When a buffer becomes empty, read a block from the appropriate run file.

Each record is read only once from disk during the merge process.

Page 249: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Multiway Merge (2)

In practice, use only one file and seek to appropriate block.

Page 250: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Limits to Multiway Merge (1)

Assume working memory is b blocks in size.

How many runs can be processed at one time?

The runs are 2b blocks long (on average).

How big a file can be merged in one pass?

Page 251: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Limits to Multiway Merge (2)

Larger files will need more passes -- but the run size grows quickly!

This approach trades (log b) (possibly) sequential passes for a single or very few random (block) access passes.

Page 252: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

General Principles

A good external sorting algorithm will seek to do the following:

– Make the initial runs as long as possible.– At all stages, overlap input, processing and

output as much as possible.– Use as much working memory as possible.

Applying more memory usually speeds processing.

– If possible, use additional disk drives for more overlapping of processing with I/O, and allow for more sequential file processing.

Page 253: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Search

Given: Distinct keys k1, k2, …, kn and collection L of n records of the form

(k1, I1), (k2, I2), …, (kn, In)where Ij is the information associated with key

kj for 1 <= j <= n.

Search Problem: For key value K, locate the record (kj, Ij) in L such that kj = K.

Searching is a systematic method for locating the record(s) with key value kj = K.

Page 254: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Successful vs. Unsuccessful

A successful search is one in which a record with key kj = K is found.

An unsuccessful search is one in which no record with kj = K is found (and presumably no such record exists).

Page 255: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Approaches to Search

1. Sequential and list methods (lists, tables, arrays).

2. Direct access by key value (hashing)

3. Tree indexing methods.

Page 256: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Average Cost for Sequential Search

• How many comparisons does sequential search do on average?

• We must know the probability of occurrence for each possible input.

• Must K be in L?• For analysis, ignore everything except the

position of K in L. Why?• What are the n + 1 events?

Page 257: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Average Cost (cont)• Let ki = I+1 be the number of comparisons

when X = L[i].

• Let kn = n be the number of comparisons when X is not in L.

• Let pi be the probability that X = L[i].

• Let pn be the probability that X is not in L[i] for any I.

n

ii

n

iii ipnppkpkn

10

100)(T

Page 258: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Generalizing Average Cost

What happens to the equation if we assume all pi's are equal (except p0)?

Depending on the value of p0,

(n+1)/2 < T(n) < n.

2

)1(10)( 0

1

npnippn

n

i

T

Page 259: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Searching Ordered Arrays• Change the model: Assume that the elements are

in ascending order.• Is linear search still optimal? Why not?• Optimization: Use linear search, but test if the

element is greater than K. Why?• Observation: If we look at L[5] and find that K is

bigger, then we rule out L[1] to L[4] as well.• More is Better: If K > L[n], then we know in one

test that K is not in L. – What is wrong here?

Page 260: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Jump Search

• What is the right amount to jump?

• Algorithm:– Check every k'th element (L[k], L[2k], ...).– If K is greater, then go on.– If K is less, then use linear search on the k

elements.

• This is called Jump Search.

Page 261: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Analysis of Jump Search

If mk <= n < (m+1)k, then the total cost is at most m + k – 1 3-way comparisons.

What should k be?

11),(

kk

nkmknT

1min

1k

k

nnk

Page 262: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Jump Search Analysis (cont)

Take the derivative and solve for T'(x) = 0 to find the minimum.

This is a minimum when

What is the worst case cost?

Roughly

nk

n2

Page 263: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Lessons

We want to balance the work done while selecting a sublist with the work done while searching a sublist.

In general, make sub-problems of equal effort.

This is an example of divide and conquer.

What if we extend this to three levels?

Page 264: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Interpolation Search

(Also known as Dictionary Search)

Search L at a position that is appropriate to the value K.

Repeat as necessary to recalculate p for future searches.

]1[][

]1[

LL

L

n

Kp

Page 265: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Quadratic Binary Search

(This is easier to analyze.)

Compute p and examine

If then sequentially probe

Until we reach a value less than or equal to K.

Similar for

][L pn

][ pnK L

...3,2,1],[ inipnL

][ pnLK

Page 266: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Quadratic Binary Search (cont)

We are now within positions of K.

ASSUME (for now) that this takes a constant number of comparisons.

We now have a sublist of size .

Repeat the process recursively.

What is the cost?

n

n

Page 267: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

QBS Probe Count

QBS cost is (log log n) if the number of probes on jump search is constant.

From Cebysev’s inequality, we can show that on uniformly distributed data, the average number of probes required will be about 2.4.

Is this better than binary search? Theoretically, yes (in the average case).

Page 268: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Comparison

n log n log log n Diff

16 4 2 2

256 8 3 2.7

64k 16 4 4

232 32 5 6.4

n log n-1 2.4 log log n Diff

16 3 4.8 worse

256 7 7.2 same

64k 15 9.6 1.6

232 31 12 2.6

Page 269: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Lists Ordered by Frequency

Order lists by (expected) frequency of occurrence.– Perform sequential search

Cost to access first record: 1Cost to access second record: 2

Expected search cost:

....21 21 nn npppC

Page 270: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Examples(1)

(1) All records have equal frequency.

2/)1(/1

nniCn

i

n

Page 271: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Examples(2)

(2) Geometric Frequency

ni

nip

n

i

i

if2/1

11if2/11

{

n

i

in iC

1

.2)2/(

Page 272: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Zipf Distributions

Applications:– Distribution for frequency of word usage in

natural languages.– Distribution for populations of cities, etc.

80/20 rule:– 80% of accesses are to 20% of the records.– For distributions following 80/20 rule,

n

iennn nnniiC

1

.log/H/Η/

.122.0 nCn

Page 273: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Self-Organizing Lists

Self-organizing lists modify the order of records within the list based on the actual pattern of record accesses.

Self-organizing lists use a heuristic for deciding how to reorder the list. These heuristics are similar to the rules for managing buffer pools.

Page 274: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Heuristics

1. Order by actual historical frequency of access. (Similar to LFU buffer pool replacement strategy.)

2. When a record is found, swap it with the first record on list.

3. Move-to-Front: When a record is found, move it to the front of the list.

4. Transpose: When a record is found, swap it with the record ahead of it.

Page 275: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Text Compression Example

Application: Text Compression.

Keep a table of words already seen, organized via Move-to-Front heuristic.

• If a word not yet seen, send the word.• Otherwise, send (current) index in the table.

The car on the left hit the car I left.The car on 3 left hit 3 5 I 5.

This is similar in spirit to Ziv-Lempel coding.

Page 276: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Searching in Sets

For dense sets (small range, high percentage of elements in set).

Can use logical bit operators.

Example: To find all primes that are odd numbers, compute:0011010100010100 & 0101010101010101

Document processing: Signature files

Page 277: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

277

Indexing

Goals:– Store large files– Support multiple search keys– Support efficient insert, delete, and range

queries

Page 278: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

278

Files and Indexing

Entry sequenced file: Order records by time of insertion.– Search with sequential search

Index file: Organized, stores pointers to actual records.– Could be organized with a tree or other data

structure.

Page 279: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

279

Keys and Indexing

Primary Key: A unique identifier for records. May be inconvenient for search.

Secondary Key: An alternate search key, often not unique for each record. Often used for search key.

Page 280: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

280

Linear Indexing (1)Linear index: Index file organized as a

simple sequence of key/record pointer pairs with key values are in sorted order.

Linear indexing is good for searching variable-length records.

Page 281: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

281

Linear Indexing (2)

If the index is too large to fit in main memory, a second-level index might be used.

Page 282: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

282

Tree Indexing (1)

Linear index is poor for insertion/deletion.

Tree index can efficiently support all desired operations:– Insert/delete– Multiple search keys (multiple indices)– Key range search

Page 283: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

283

Tree Indexing (2)

Difficulties when storing tree index on disk:– Tree must be balanced.– Each path from root to leaf

should cover few disk pages.

Page 284: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

284

2-3 Tree

A 2-3 Tree has the following properties:1. A node contains one or two keys2. Every internal node has either two children

(if it contains one key) or three children (if it contains two keys).

3. All leaves are at the same level in the tree, so the tree is always height balanced.

The 2-3 Tree has a search tree property analogous to the BST.

Page 285: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

285

2-3 Tree Example

The advantage of the 2-3 Tree over the BST is that it can be updated at low cost.

Page 286: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

286

2-3 Tree Insertion (1)

Page 287: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

287

2-3 Tree Insertion (2)

Page 288: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

288

2-3 Tree Insertion (3)

Page 289: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

289

B-Trees (1)

The B-Tree is an extension of the 2-3 Tree.

The B-Tree is now the standard file organization for applications requiring insertion, deletion, and key range searches.

Page 290: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

290

B-Trees (2)

1. B-Trees are always balanced.2. B-Trees keep similar-valued records

together on a disk page, which takes advantage of locality of reference.

3. B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation.

Page 291: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

291

B-Tree Definition

A B-Tree of order m has these properties:– The root is either a leaf or has two children.– Each node, except for the root and the

leaves, has between m/2 and m children.– All leaves are at the same level in the tree,

so the tree is always height balanced.

A B-Tree node is usually selected to match the size of a disk block.

– A B-Tree node could have hundreds of children.

Page 292: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

292

B-Tree SearchGeneralizes search in a 2-3 Tree.

1. Do binary search on keys in current node. If search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search.

2. Otherwise, follow the proper branch and repeat the process.

Page 293: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

293

B+-Trees

The most commonly implemented form of the B-Tree is the B+-Tree.

Internal nodes of the B+-Tree do not store record -- only key values to guild the search.

Leaf nodes store records or pointers to records.

A leaf node may store more or less records than an internal node stores keys.

Page 294: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

294

B+-Tree Example

Page 295: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

295

B+-Tree Insertion

Page 296: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

296

B+-Tree Deletion (1)

Page 297: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

297

B+-Tree Deletion (2)

Page 298: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

298

B+-Tree Deletion (3)

Page 299: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

299

B-Tree Space Analysis (1)

B+-Trees nodes are always at least half full.

The B*-Tree splits two pages for three, and combines three pages into two. In this way, nodes are always 2/3 full.

Asymptotic cost of search, insertion, and deletion of nodes from B-Trees is (log n).

– Base of the log is the (average) branching factor of the tree.

Page 300: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

300

B-Tree Space Analysis (2)

Example: Consider a B+-Tree of order 100 with leaf nodes containing 100 records.

1 level B+-tree:2 level B+-tree:3 level B+-tree:4 level B+-tree:

Ways to reduce the number of disk fetches:– Keep the upper levels in memory.– Manage B+-Tree pages with a buffer pool.

Page 301: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Graphs

A graph G = (V, E) consists of a set of vertices V, and a set of edges E, such that each edge in E is a connection between a pair of vertices in V.

The number of vertices is written |V|, and the number edges is written |E|.

Page 302: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Graphs (2)

Page 303: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Paths and Cycles

Path: A sequence of vertices v1, v2, …, vn of length n-1 with an edge from vi to vi+1 for 1 <= i < n.

A path is simple if all vertices on the path are distinct.

A cycle is a path of length 3 or more that connects vi to itself.

A cycle is simple if the path is simple, except the first and last vertices are the same.

Page 304: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Connected Components

An undirected graph is connected if there is at least one path from any vertex to any other.

The maximum connected subgraphs of an undirected graph are called connected components.

Page 305: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Directed Representation

Page 306: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Undirected Representation

Page 307: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Representation Costs

Adjacency Matrix:

Adjacency List:

Page 308: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Graph ADTinterface Graph { // Graph class ADT public void Init(int n); // Initialize public int n(); // # of vertices public int e(); // # of edges public int first(int v); // First neighbor public int next(int v, int w); // Neighbor public void setEdge(int i, int j, int wght); public void delEdge(int i, int j); public boolean isEdge(int i, int j); public int weight(int i, int j); public void setMark(int v, int val); public int getMark(int v); // Get v’s Mark}

Page 309: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Graph Traversals

Some applications require visiting every vertex in the graph exactly once.

The application may require that vertices be visited in some special order based on graph topology.

Examples:– Artificial Intelligence Search– Shortest paths problems

Page 310: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Graph Traversals (2)

To insure visiting all vertices:

void graphTraverse(Graph G) { int v; for (v=0; v<G.n(); v++) G.setMark(v, UNVISITED); // Initialize for (v=0; v<G.n(); v++) if (G.getMark(v) == UNVISITED) doTraverse(G, v);}

Page 311: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Depth First Search (1)// Depth first search void DFS(Graph G, int v) { PreVisit(G, v); // Take appropriate action G.setMark(v, VISITED); for (int w = G.first(v); w < G.n(); w = G.next(v, w)) if (G.getMark(w) == UNVISITED) DFS(G, w); PostVisit(G, v); // Take appropriate action}

Page 312: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Depth First Search (2)

Cost: (|V| + |E|).

Page 313: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Breadth First Search (1)

Like DFS, but replace stack with a queue.– Visit vertex’s neighbors before continuing

deeper in the tree.

Page 314: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Breadth First Search (2)

void BFS(Graph G, int start) { Queue<Integer> Q = new AQueue<Integer>(G.n()); Q.enqueue(start); G.setMark(start, VISITED); while (Q.length() > 0) { // For each vertex int v = Q.dequeue(); PreVisit(G, v); // Take appropriate action for (int w = G.first(v); w < G.n(); w = G.next(v, w)) if (G.getMark(w) == UNVISITED) { // Put neighbors on Q G.setMark(w, VISITED); Q.enqueue(w); } PostVisit(G, v); // Take appropriate action }}

Page 315: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Breadth First Search (3)

Page 316: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Topological Sort (1)

Problem: Given a set of jobs, courses, etc., with prerequisite constraints, output the jobs in an order that does not violate any of the prerequisites.

Page 317: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Topological Sort (2)void topsort(Graph G) { for (int i=0; i<G.n(); i++) G.setMark(i, UNVISITED); for (int i=0; i<G.n(); i++) if (G.getMark(i) == UNVISITED) tophelp(G, i);}

void tophelp(Graph G, int v) { G.setMark(v, VISITED); for (int w = G.first(v); w < G.n(); w = G.next(v, w)) if (G.getMark(w) == UNVISITED) tophelp(G, w); printout(v);}

Page 318: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Topological Sort (3)

Page 319: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Queue-Based Topsortvoid topsort(Graph G) { Queue<Integer> Q = new AQueue<Integer>(G.n()); int[] Count = new int[G.n()]; int v, w; for (v=0; v<G.n(); v++) Count[v] = 0; for (v=0; v<G.n(); v++) for (w=G.first(v); w<G.n(); w=G.next(v, w)) Count[w]++; for (v=0; v<G.n(); v++) if (Count[v] == 0) Q.enqueue(v); while (Q.length() > 0) { v = Q.dequeue().intValue(); printout(v); for (w=G.first(v); w<G.n(); w=G.next(v, w)) { Count[w]--; if (Count[w] == 0) Q.enqueue(w); } }}

Page 320: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Shortest Paths Problems

Input: A graph with weights or costs associated with each edge.

Output: The list of edges forming the shortest path.

Sample problems:– Find shortest path between two named vertices– Find shortest path from S to all other vertices– Find shortest path between all pairs of vertices

Will actually calculate only distances.

Page 321: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Shortest Paths Definitions

d(A, B) is the shortest distance from vertex A to B.

w(A, B) is the weight of the edge connecting A to B.– If there is no such edge, then w(A, B) = .

Page 322: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Single-Source Shortest Paths

Given start vertex s, find the shortest path from s to all other vertices.

Try 1: Visit vertices in some order, compute shortest paths for all vertices seen so far, then add shortest path to next vertex x.

Problem: Shortest path to a vertex already processed might go through x.

Solution: Process vertices in order of distance from s.

Page 323: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Example Graph

Page 324: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Dijkstra’s Algorithm Example

A B C D E

Initial 0

Process A 0 10 3 20

Process C 0 5 3 20 18

Process B 0 5 3 10 18

Process D 0 5 3 10 18

Process E 0 5 3 10 18

Page 325: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Dijkstra’s Implementation// Compute shortest path distances from s,// store them in Dvoid Dijkstra(Graph G, int s, int[] D) { for (int i=0; i<G.n(); i++) // Initialize D[i] = Integer.MAX_VALUE; D[s] = 0; for (int i=0; i<G.n(); i++) { int v = minVertex(G, D); G.setMark(v, VISITED); if (D[v] == Integer.MAX_VALUE) return; for (int w = G.first(v); w < G.n(); w = G.next(v, w)) if (D[w] > (D[v] + G.weight(v, w))) D[w] = D[v] + G.weight(v, w); }}

Page 326: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Implementing minVertex

Issue: How to determine the next-closest vertex? (I.e., implement minVertex)

Approach 1: Scan through the table of current distances.– Cost: (|V|2 + |E|) = (|V|2).

Approach 2: Store unprocessed vertices using a min-heap to implement a priority queue ordered by D value. Must update priority queue for each edge.– Cost: ((|V| + |E|)log|V|)

Page 327: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Approach 1int minVertex(Graph G, int[] D) { int v = 0; // Initialize to unvisited vertex; for (int i=0; i<G.n(); i++) if (G.getMark(i) == UNVISITED) { v = i; break; } for (int i=0; i<G.n(); i++) // Now find smallest value if ((G.getMark(i) == UNVISITED) && (D[i] < D[v])) v = i; return v;}

Page 328: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Approach 2void Dijkstra(Graph G, int s, int[] D) { int v, w; DijkElem[] E = new DijkElem[G.e()]; E[0] = new DijkElem(s, 0); MinHeap<DijkElem> H = new MinHeap<DijkElem>(E, 1, G.e()); for (int i=0; i<G.n(); i++) D[i] = Integer.MAX_VALUE; D[s] = 0; for (int i=0; i<G.n(); i++) { do { v = (H.removemin()).vertex(); } while (G.getMark(v) == VISITED); G.setMark(v, VISITED); if (D[v] == Integer.MAX_VALUE) return; for (w=G.first(v); w<G.n(); w=G.next(v, w)) if (D[w] > (D[v] + G.weight(v, w))) { D[w] = D[v] + G.weight(v, w); H.insert(new DijkElem(w, D[w])); } }}

Page 329: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Minimal Cost Spanning Trees

Minimal Cost Spanning Tree (MST) Problem:

Input: An undirected, connected graph G.Output: The subgraph of G that 1) has minimum total cost as measured by

summing the values of all the edges in the subset, and

2) keeps the vertices connected.

Page 330: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

MST Example

Page 331: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Prim’s MST Algorithm// Compute a minimal-cost spanning treevoid Prim(Graph G, int s, int[] D, int[] V) { int v, w; for (int i=0; i<G.n(); i++) // Initialize D[i] = Integer.MAX_VALUE; D[s] = 0; for (int i=0; i<G.n(); i++) { v = minVertex(G, D); G.setMark(v, VISITED); if (v != s) AddEdgetoMST(V[v], v); if (D[v] == Integer.MAX_VALUE) return; for (w=G.first(v); w<G.n(); w=G.next(v, w)) if (D[w] > G.weight(v, w)) { D[w] = G.weight(v, w); V[w] = v; } }}

Page 332: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Alternate Implementation

As with Dijkstra’s algorithm, the key issue is determining which vertex is next closest.

As with Dijkstra’s algorithm, the alternative is to use a priority queue.

Running times for the two implementations are identical to the corresponding Dijkstra’s algorithm implementations.

Page 333: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Kruskal’s MST Algorithm (1)

Initially, each vertex is in its own MST.

Merge two MST’s that have the shortest edge between them.– Use a priority queue to order the unprocessed

edges. Grab next one at each step.

How to tell if an edge connects two vertices already in the same MST?– Use the UNION/FIND algorithm with parent-

pointer representation.

Page 334: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Kruskal’s MST Algorithm (2)

Page 335: Coursenotes CS3114: Data Structures and Algorithms* Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright © 2008 *Temporarily listed.

Kruskal’s MST Algorithm (3)

Cost is dominated by the time to remove edges from the heap.– Can stop processing edges once all vertices

are in the same MST

Total cost: (|V| + |E| log |E|).