91.102 - Computing II

91.102 - Computing II

Efficiency, Notation and Mathematical Ideas.

How can we tell that “one way of doing something is better than another”?

The answer has to be pretty much independent of when the question is asked and of what hardware we are using to execute the program on.

Case in point (Fall 98): Apple Computer claims that its G3 processors are roughly twice as fast - in terms of throughput - as Intel Pentium II processors running at the same clock speed, and has the benchmarks to “prove it”. A PC magazine, using different benchmarks shows that Intel processors running at the same clock speed can be many times faster than the G3s doing the same tasks. Who’s right??


Both and neither… How’s that possible?

By comparing apples and oranges, and exploiting the fact that programs run in a complete environment, using all kinds of resources. Example: if you run a graphics 3D program, and one machine has a very good graphics accelerator, and the other a mediocre one, the performance difference can be a factor of 10 or more. Neither really measures CPU performance… On the other hand, showing that one CPU can perform integer operations three times as fast as the other may not lead to any advantage that an “average user” can see, unless the whole system is designed to exploit this capability: unfortunately, it is user programs that determine what kind of operations they perform most...


Furthermore, compilers differ enormously in the quality of the code they generate, and NOT uniformly over all the code.

The very different architectures of the Apple and Wintel platforms provide a wonderful environment for creating skewed comparisons and conflicting claims that cannot be resolved to everyone’s satisfaction… a bit like politics…


Another case in point (Oct. 2001): AMD vs Intel.

Athlon vs Pentium.

AMD wants to push for a benchmark different from the "Gigahertz" one - for the same reason Apple pushed for it in 1998: GH and throughput are NOT The same thing…

So, what do we do?

We must find ways that are “essentially independent” of hardware and language - where only the algorithm “really counts”. This means that we will get results that are “general” but maybe not so precise that we can “really decide” between two ways of performing a task that give us nearly equal predictions.

This is already hard enough, as it turns out...



A reasonable measure to use is the size of the input to the program: generally, the more items in the input, the more time and space it will take to generate the output.

TIME and SPACE are the other measures: how long will it take the program to run, given an input of a certain size? How much space will the program require give an input of a certain size?

A reasonable criterion to use to set up a time comparison is the dominant operation one: which operation (assignment, comparison, function call, etc.) is the one that best characterizes the processing of the input data?

One could pick some input data, run the program and measure. Why is that not enough?

Unfortunately the size of the input may not be a reliable predictor, in the sense that different runs with different inputs of the same size could give very different results.

The empirical results simply compare size of some specific inputs to time-to-completion or space used: it's up to the analyst to determine how the relation between input size and the results can be best understood as a general problem… this is what makes the whole thing quite hard, but might explain fluctuations in the input/resource equation.



As usual, let’s pick some simple algorithm and try to find out what is going on. The text takes SelectionSort, and runs it with input sets of different sizes on two different machines. It gets a table:

Home Computer Desktop Computer

125 12.5 2.8

250 49.3 11.0

500 195.8 43.4

1000 780.3 172.4

2000

Array Size = n

690.53114.9


Timeout: Selection Sorting in DESCENDING order.

void SelectionSort(ItemType *inArray, int m, int n)

{ int maxPosition;

int temp;

if (m < n) {

maxPosition = FindMax(inArray, m, n);

temp = inArray[m];

inArray[m] = inArray[maxPosition];

inArray[maxPosition] = temp;

SelectionSort(inArray, m+1, n);

}

}


7 3 0 1 9 6 5 2 8 4A =0 1 2 3 4 5 6 7 8 9

4 <- FindMax(A,0,9)

9 3 0 1 7 6 5 2 8 4A =0 1 2 3 4 5 6 7 8 90


Let’s now plot the points:

Data Points From the Table

0

500

1000

1500

2000

2500

3000

3500

0 500 1000 1500 2000 2500

Array Size

Home Computer

Desktop Computer


Most spreadsheets provide a facility for “curve fitting”, i.e, they find a curve that “fits” a set of data points according to some criteria - usually sensible. What would we get in this case?


We plot and compute the “trendlines”:

Data Points From the Table

y = 0.0008x2 + 0.0032x - 0.0627

y = 0.0002x2 + 0.0005x + 0.0784

0

500

1000

1500

2000

2500

3000

3500

0 500 1000 1500 2000 2500

Array Size

HomeComputer

DesktopComputer

Poly. (HomeComputer)

Poly. (DesktopComputer)


You may have observed that the trendlines computed by Excel are not identical to the trendlines provided by the text - they probably used slightly different algorithms (software?) to get there…

The important thing is that the two functions allow us to “extrapolate” the cost of running SelectionSort (on the two different kinds of machines) on Data Sets of sizes DIFFERENT from the ones of the empirical study.

We also found that there seems to be a simple relationship (well approximated by a quadratic function) between run time and the size of the set to be sorted...


The basic idea is that we introduce a notation: O (big-Oh) to tell us what kind of trend we can expect.

In the two cases we just saw, we have

F1(n) = 0.0008 n2 + 0.0032 n + 0.0627

and

F2(n) = 0.0002 n2 + 0.0005 n + 0.0784

Since, intuitively, the square of a number grows faster than the number itself (or a constant), and the leading coefficients are positive, both functions will grow, for large n, not much worse than 0.0008 n2 and 0.0002 n2, respectively. It is fairly easy to show that there exist constants C1 > 0.0008 and C2 > 0.0002 so that F1(n) <= C1 n2 and F2(n) <= C2 n2 for all “large” n.

We say that F1(n) = O(n2), and also that F2(n) = O(n2). The meaning of this notation (repeating ourselves) is that there exist constants C1 and C2 and positive integers N1 and N2 such that

F1(n) <= C1 n2 for all n >= N1,

andF2(n) <= C2 n2 for all n >= N2.

So, in some way, our notation does not really distinguish between the two… they just both grow NO WORSE than “quadratically”, even though one grows faster than the other. The crucial thing turns out to be the n2: otherwise, for large n, F1(n) is always about four times larger than F2(n).



Some Growth Comparisons: F(n)

n1

10102

103

104

105

106

107

108

109

log10(n)0123456789

10

n1/2

1100.5

101

101.5

102

102.5

103

103.5

104

104.5

105

n log10(n)0

10102*2103*3104*4105*5106*6107*7108*8109*9

1010*10

n2

1102

104

106

108

1010

1012

1014

1016

1018

1020

n3

1103

106

109

1012

1015

1018

1021

1024

1027

1030

2n

21024≈1030

≈10300

nn

11010

10200

1010


To keep things in perspective, there are, roughly, 3.15*1013 microseconds in a year, and only 1000 times as many nanoseconds… Looking back at the table, there may well be problems where the algorithm chosen for solution - if it matches one of the faster growing functions - will never run to completion on more than trivially small sets of data…

We need strategies to examine a proposed algorithm and determine some bound on the amount of time we expect it to run for a data set of given size.


Problems:

a) how do we come up with a formula?

b) is there an “algebra” of formulae?

The second question really means: if we examine two algorithms and get two formulae f1(n) and f2(n), how do we compare such formulae and what do we do if, for example, the two algorithms need to be run sequentially - or in some other relationship to each other - over the same data set?

We will now give some answers to the second question, the first one being - unfortunately - much harder...

1) if f1(n) = a0 + a1*n + a2*n2 + … + am*nm, for large enough n we can get a decent approximation by just looking at the term of highest degree: f1(n) ≈ am*nm. So we can conclude that

f1(n) is in (or of type) O(nm)

This needs proof, but intuition should be adequate for now.



2) If two functions f1(n) and f2(n) are both in the same O-class, say O(n17), and the coefficients of the highest degree terms are of the same sign (positive, for our purposes), then their sum is in the same O-class. If they belong to two different O-classes, their sum will belong to the ”larger” of the two O-classes.

Ex: f1(n) O(n3) and f2(n) O(n5) then f1(n) + f2(n) O(n5).

This corresponds to two successive function calls, two successive program fragments, etc., where you have been able to get estimates separately, and now you want to estimate the total effect: up to a multiplicative constant, it will be no worse than that of the single worst fragment.

3) If two functions f1(n) and f2(n) are in O(na) and O(nb), respectively, their product f1(n) * f2(n) is of class O(na+b)

This (usually) corresponds to a loop (or a recursion) , where f1(n) is, for example, the number of times the loop executes (or a function is called), and f2(n) is the cost of one execution of the loop (function) body.

4) If two functions f1(n) and f2(n) are in O(na) and O(nb), respectively, their composition f1(f2(n)) (if it makes sense) is of class O(nab). We are unlikely to see much of this, unless the output of f1 is of the same type as the input of f2. Space -> Time functions can’t be meaningfully composed.



The solution to question a) is the really hard part: we will spend a fair amount of time examining all the algorithms we’ll study trying to acquire both the techniques and the intuition to derive some formulae.

Let's look at SelectionSort as a first "practice" run…

SelectionSort(A, 0, n);

where A is the input array and n is the largest index that contains a value.


Selection Sorting in DESCENDING order.void SelectionSort(ItemType *inArray, int m, int n)

{ int maxPosition;

int temp;

if (m < n) {

maxPosition = FindMax(inArray, m, n);

temp = inArray[m];

inArray[m] = inArray[maxPosition];

inArray[maxPosition] = temp;

SelectionSort(inArray, m+1, n);

} // else do nothing and return...

}

If we start with m = 0, we go through n +1 recursive calls. Each call (minus the last one) will perform one call to FindMax, one "swap" and one to SelectionSort.


int FindMax(ItemType *inArray, int m, int n)

{

int i = m; int j = m;

do {

i++; // creep up the array

if (inArray[i] > inArray[j]) { // so, numbers...

j = i; // if you found something bigger...

}

} while (i != n); // all the way to the end

return j; // return index of largest

}

The "do loop" contains one incrementation, two(?) comparisons and, possibly, one assignment. The loop itself is executed n - m times.


So: n calls to FindMax, each call to FindMax has n - m array comparisons.

Total number of comparisons:

∑m = 0

n-1

(n - m) = ∑m = 1

n

m = n*(n + 1)/2

= (1/2)n2 + (1/2)n

O(n2)

What about the recursive calls, swaps, incrementations, termination checks, etc.??

They can each be counted as equivalent to a fixed number of array comparisons - so all we change is the coefficient of n2, and NOT the leading power...


We can also use our "algebra of formulae":

SelectionSort(A, 0, n) ~ f1(n) O(n),

if the dominant operation is the execution of the body of the if statement - which controls the recursive call.

Within the if statement (during one recursive call), the dominant operation is given by

FindMax(A, m, n) ~ f2(n) O(n - m) = O(n),

in terms of comparisons of array elements.

The product: f1(n)*f2(n) O(n2).


Careful:

These methods always lead to formulae and bounds that are useful “for large enough n” - i.e. for large enough data sets.

Problem: when is the data set “large enough”?

Problem: what happens with small data sets?

Some of the algorithms that are good for large data sets are awfully complicated to code: when is the overhead introduced by the code complexity more than the gain in the asymptotic behavior?


Careful:

Sequential Search vs. Binary Search…

Sequential Search is O(n) - every unsuccessful comparison reduces the set still to search by 1 item.

Binary Search is O(log2(n)) - every unsuccessful comparison reduces the set still to search to 1/2 the previous size.

The table says that for large n Binary Search is MUCH faster than Sequential Search. Is that true always? Even for “small” n? Why or Why Not?

Interpolation Search is O(log2(log2(n))), which is even better than Binary Search. Why don’t we ALWAYS use it?


Sequential Search: (array A need not be sorted, Key need not be ORDERABLE in any usual way)

int SequentialSearch(Key K, SearchArray A)

{ int i;

for(i = 0; i < n; ++i) {

if (K == A[i]) return(i);

}

return(-1);

}

Try to analyze…

Best Case: A[0]; Worst : A[n]; Average : A[n/2]...


Binary Search: (array A is sorted in INCREASING order of Key)

int BinarySearch(Key K, SearchArray A, int low,int high)

{ int mid = (low + high)/2;

if (low > high) return -1; // nothing there

else if (K == A[mid]) return(mid); // found it.

else if (K < A[mid]) // look in left half

return BinarySearch(K, A, low, mid - 1);

else // look in right half

return BinarySearch(K, A, mid + 1, high);

}

91.102 - Computing II

Documents

Transcript of 91.102 - Computing II