Pearls of Functional Algorithm Design Chapter 2 1 Roger L. Costello July 2011.

55
Pearls of Functional Algorithm Design Chapter 2 1 Roger L. Costello July 2011

Transcript of Pearls of Functional Algorithm Design Chapter 2 1 Roger L. Costello July 2011.

1

Pearls of Functional Algorithm Design

Chapter 2

Roger L. CostelloJuly 2011

2

The Problem We Will Solve

3

Recurring Problem

• Stock Market: each day I record the closing value of the DOW. Occasionally, I pick a date and ask, “How many days after this date has the stock market closed at a higher value?”

• A more challenging question is, “Which day has the most number of following days where the stock market closed at a higher value?”

4

6/1/11 6/2/11 6/3/11 6/6/11 6/7/11 6/8/11 6/9/11 6/10/11 6/13/11

DOW: 12,324 12,214 12,390 1 2,400 12,367 12,380 12,310 12,330 12,340

Date:

Number of days that surpassed this day: 6

Number of days that surpassed this day: 7

Number of days that surpassed this day: 1

Number of days that surpassed this day: 0

Number of days that surpassed this day: 1

Number of days that surpassed this day: 1

Number of days that surpassed this day: 2

Number of days that surpassed this day: 1

5

Recurring Problem (cont.)

• People’s Height: line up a bunch of people. Pick one person and ask, “How many of the following people are taller than this person?”

• A more challenging question is, “Which person has the most number of following people that are taller?”

6

Tom John George Jim Pete Sam Bill Mike Shaun

Height (inches): 72 68 69 73 65 68 69 64 71

Person:

Number of persons that surpass this person’s height: 1

Number of persons that surpass this person’s height: 4

Number of persons that surpass this person’s height: 2

Number of persons that surpass this person’s height: 0

Number of persons that surpass this person’s height: 3

Number of persons that surpass this person’s height: 2

Number of persons that surpass this person’s height: 1

Number of persons that surpass this person’s height: 1

7

Recurring Problem (cont.)

• Word Analysis: take a letter in a word and ask, “How many of the following letters are bigger (occurs later in the alphabet) than this letter?”

• A more challenging question is, “Which letter has the most number of following letters that are bigger?”

8

Word: G E N E R A T I N G

Number of letters that surpass this letter: 5

Number of letters that surpass this letter: 6

Number of letters that surpass this letter: 2

Number of letters that surpass this letter: 5

Number of letters that surpass this letter: 1

Number of letters that surpass this letter: 4

Number of letters that surpass this letter: 0

Number of letters that surpass this letter: 1

Number of letters that surpass this letter: 0

9

Problem Statement

• Create a list of values. – Example: create a list of stock market values, or a list of

people’s heights, or a list of letters.

• Simple Problem: select one value in the list, and count the number of following values that surpass it.

• Harder Problem: for every value in the list solve the simple problem; this produces a list of numbers; return the maximum number.– This is called the “surpasser problem”

10

Solve the Simple Problem

• Let’s create a function that counts the number of surpassers of a value.

• The function takes two arguments:1. The value, x

2. A list, xs, that consists of all the values that follow x

11

Select the list items that are greater than 'G'

E N E R A T I N G

filter (>'G') ____

[N, R, T, I, N]

12

Count the selected list items

[N, R, T, I, N]

length ____

5

Five items surpass “G”. That’s the answer!

13

scount

• “scount” (surpasser count) is a user-defined function; it is the collection of functions shown on the previous two slides.

scount :: Ord a => a -> [a] -> Intscount x xs = length (filter (>x) xs)

scount

14

Solve the Harder Problem

• We need to apply “scount” to each item in the list, producing a list of numbers; then take the maximum of the numbers.

15

Invoke “scount” multiple times"GENERATING"

scount 'G' "ENERATING" 5scount 'E' "NERATING" 6scount 'N' "ERATING" 2scount 'E' "RATING" 5scount 'R' "ATING" 1scount 'A' "TING" 4scount 'T' "ING" 0scount 'I' "NG" 1scount 'N' "G" 0scount 'G' "" 0

maximum: 6

16

tails

• “tails” is a standard function.• It takes one argument, a list.• It returns a list of lists, i.e., a list of all items,

then a list of all items but the first, then a list of all items but the first and second, etc.

tails "GENERATING"

["GENERATING","ENERATING","NERATING",…,"G",""]

17

List Comprehension

• Recall that “scount” takes as arguments a value, x, and a list consisting of its following items.

• A list comprehension will be used to provide the arguments to “scount”:

[scount z zs | z : zs <- tails xs]

“For each list produced by the tails function, take its first item and the remaining items, and use them as arguments to the scount function.”

18

Set of surpasser counts

[scount z zs | z: zs <- tails ____]

"GENERATING"

[5,6,2,5,1,4,0,1,0,0]

19

maximum surpasser count (msc)

[5,6,2,5,1,4,0,1,0,0]

maximum ____

6

That’s the answer!

20

msc

• “msc” (maximum surpasser count) is a user-defined function; it is the collection of functions shown on the previous two slides.

msc :: Ord a => [a] -> Intmsc xs = maximum [scount z zs | z : zs <- tails xs]

msc

21

Here’s the Solution

import List

-- msc = maximum surpasser count

msc :: Ord a => [a] -> Intmsc xs = maximum [scount z zs | z : zs <- tails xs]

scount :: Ord a => a -> [a] -> Intscount x xs = length (filter (>x) xs)

22

Time Requirements

• With a list of length “n” the msc function shown on the previous slide takes on the order of n2 steps.

• Here’s why: recall that n surpasser counts are generated (see slide 18). To generate the first surpasser count, we take the first list item and compare it against the remaining n-1 items. To generate the second surpasser count, we take the second list item and compare it against the remaining n-2 items. And so forth. So, the total number of comparisons is:

(n-1) + (n-2) + … + 1 = n(n+1)/2, i.e., T(n) = O(n2)

23

Divide and ConquerSolution

24

The Key Concepts

1. Determine the maximum surpasser count (msc) of list ws.

2. Divide ws into two lists: ws xs + ys

3. Determine the scount of each value in xs and the scount of each value in ys.

4. Assume that xs and ys are sorted in increasing order and ys is of length n.

5. x is the first value in xs and it has an scount (within xs) of c. y is the first value in ys and it has an scount (within ys) of d. There are the two cases to consider:

a) x < y: then the scount of x equals c + n (remember, ys is sorted, so if x < y then it is less than all n values in ys).

b) x ≥ y: then the scount of y equals d (remember, xs and ys are sorted, so if x ≥ y then y is less than all values in xs and all values in ys).

25

The Simplest Example

GE

G E

Split into xs and ys

26

GE

('G',0) ('E',0)

The scount of 'G' in xs is zero and the scount of 'E' in ys is zero.

27

GE

('G',0) ('E',0)

xs is sorted in increasing order and so is ys. Obviously.

28

GE

('G',0) ('E',0)

Compare 'G' with 'E'. 'G' ≥ 'E' so 'E' must be the smallest value. Output 'E' then 'G'.

29

GE

('G',0) ('E',0)

('E',0) : ('G',0)

30

GE

('G',0) ('E',0)

('E',0) : ('G',0)

These are the correct surpasser counts for GE. Furthermore, the resulting list is sorted!

31

Another Simple Example

NE

N E

Split into xs and ys

32

NE

('N',0) ('E',0)

The scount of 'N' in xs is zero and the scount of 'E' in ys is zero.

33

GE

('N',0) ('E',0)

xs is sorted in increasing order and so is ys. Obviously.

34

GE

('N',0) ('E',0)

Compare 'N' with 'E'. 'N' ≥ 'E' so 'E' must be the smallest value. Output 'E' then 'N'.

35

GE

('N',0) ('E',0)

('E',0) : ('N',0)

36

GE

('N',0) ('E',0)

('E',0) : ('N',0)

These are the correct surpasser counts for NE. Furthermore, the resulting list is sorted!

37

A larger example

GENE

GE NE

Split into xs and ys

38

GENE

GE NE The previous slides showed how to process the two sub-lists.

('E',0) : ('G',0) ('E',0) : ('N',0)

39

GENE

GE NE

('E',0) : ('G',0) ('E',0) : ('N',0)

Compare 'E' with 'E'. 'E' ≥ 'E' so the right 'E' must be the smallest value. Output 'E' and process the remaining sub-lists.

40

GENE

GE NE

('E',0) : ('G',0) ('N',0)

Output: ('E', 0)

41

GENE

GE NE

('E',0) : ('G',0) ('N',0)

Compare 'E' with 'N'. 'E' < 'N' so all the values in ys must be surpassers of 'E'. Output 'E', but first increment its surpasser count by length ys.

42

GENE

GE NE

('G',0) ('N',0)

Output: ('E', 0) : ('E', 1)

43

GENE

GE NE

('G',0) ('N',0)

Compare 'G' with 'N'. 'G' < 'N' so all the values in ys must be surpassers of 'N'. Output 'G', but first increment its surpasser count by length ys.

44

GENE

GE NE

('N',0)

Output: ('E', 0) : ('E', 1) : ('G', 1)

""

45

GENE

GE NE

('N',0)

Output 'N'.

""

46

GENE

GE NE

Output: ('E', 0) : ('E', 1) : ('G', 1) : ('N', 0)

"" ""

47

Surpasser Counts

GENE

Output: ('E', 0) : ('E', 1) : ('G', 1) : ('N', 0)

let zs = the list of second values in each pairmsc = the maximum of zs

48

Terminology: table

GENE

('E', 0) : ('E', 1) : ('G', 1) : ('N', 0)

The result of processing is a list of pairs. The second value is the scount of the first value. This list of pairs is called a "table".

The "table function" takes as its argument a list and returns a table.

49

Terminology: joinGE

('N',0) ('E',0)

('E',0) : ('N',0)

Processing two sub-lists to create one list is called "join".

The "join function" takes as its arguments two lists, xs and ys, and returns a table.

50

Here's how to implement thetable function

table :: Ord a => [a] -> [(a,Int)]table (w:[]) = [(w, 0)]table ws = join (table xs) (table ys) where m = length (ws) n = m `div` 2 (xs,ys) = splitAt n (ws)

"Process a list. If there is just one value in the list then its surpasser count is zero and return a list containing one pair, where the second value is zero. If there's more than one value in the list then divide the list in half, into xs and ys; get the table of xs and the table of ys (i.e., recurse) and then join those two tables."

51

Here's how to implement thejoin function

join :: Ord a => [(a,Int)] -> [(a,Int)] -> [(a,Int)]join [] tys = tysjoin txs [] = txsjoin xs@((x,c):txxs) ys@((y,d):tyys) | x < y = (x, c + length ys) : join txxs ys | x >= y = (y, d) : join xs tyys

"Join two tables, txs and tys. If txs is empty then return tys. It tys is empty then return txs. Compare the first value of txs with the first value of tys. Specifically, compare the first value of each pair, (x,c) and (y,d). If x < y then x's surpasser count is c plus the length of ys (ys is an alias for the table). If x >= y then y's surpasser count is d. Join the remaining tables."

52

Efficiency improvment

• Each time the join function is invoked it computes the length of tys.

• To gain a slight efficiency improvement, invoke join with an additional argument: a value, n, corresponding to the length of tys.

53

Here's how to implement msc

msc :: Ord a => [a] -> Intmsc ws = maximum (map snd (table ws))

"Invoke the table function with the list, ws. It returns a list of pairs, (value, surpasser count). Create a list containing all the surpasser counts. Use map snd to accomplish this. Now get the largest surpasser count."

54

Here's the complete implementation

import List

-- msc = maximum surpasser count

msc :: Ord a => [a] -> Intmsc ws = maximum (map snd (table ws))

table :: Ord a => [a] -> [(a,Int)]table (w:[]) = [(w, 0)]table ws = join (table xs) (table ys) where m = length (ws) n = m `div` 2 (xs,ys) = splitAt n (ws)

join :: Ord a => [(a,Int)] -> [(a,Int)] -> [(a,Int)]join [] tys = tysjoin txs [] = txsjoin xs@((x,c):txxs) ys@((y,d):tyys) | x < y = (x, c + length ys) : join txxs ys | x >= y = (y, d) : join xs tyys

55

Time Requirements

• With a list of length “n” the msc function shown on the previous slide takes on the order of n log n steps. That's a lot faster than the first

solution, especially for a large list.