Design and Implementation* Objective: To design and implement a program for a relatively small yet...

Design and Implementation*

Objective: To design and implement a program for a relatively small yet reasonably complicated problem. To introduce and review a variety of implementation languages and to have students review the pros and cons of different implementation choices and languages.“Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious”

– Frederick P. Brooks, The Mythical Man Month

*The examples in these slides come fromBrian W. Kernighan and Rob Pike,The Practice of Programming, Addison-Wesley, 1999.

Themes“Once the data structures are laid out, the algorithms tend to fall into place, and the coding is comparatively easy”The choice of programming language is relatively unimportant to the overall design.Comparing implementations demonstrates how languages can help or hinder, and ways in which they are unimportant.

Topics

Problem: Generate random English text that reads well.Program: some data comes in, some data goes out, and the processing depends on a little ingenuity.Implementations: C, C++, Java, Perl

Failed Attempts

Generate random letters (even with proper frequency).Generate random words chosen from a dictionaryNeed a statistical model with more structure (frequency of phrases)

The Markov Algorithm - Learn

"Learn" from input:Look at all n-word phrases (prefixes); Keep a list of words that follow (one-word

suffices) for each prefix

Store the prefix/suffix-list in a dictionary The key (entry) is the prefix The satellite data, associated w/each prefix,

is the list of suffices

Note, we're "faking" a multi-map. Each prefix has, potentially, several possible suffices.

SampleThe following example uses a subset of Dr. Brooks' quoteUses a prefix length of 2 wordsWe won't bother w/stripping out punctuationWe won't worry about capitalisation (so, "We" and "we" are different strings)We will use a special (Null, Null) prefix to indicate the start.

Sample Markov Algorithm (partial list)

Input prefix Suffix words that follow(null) (null) show

(null) show me

show me your

me your flowcharts tables

your flowcharts and

flowcharts and conceal

your tables and and

will be mystified. obvious.

be obvious (null)

Sample (notes)

We don't filter out duplicates E.g., "and" appears twice for the

prefix ("your", "tables") This is fine. "and" is, from the input,

a nice word to follow. It should be a more probable choice

We also use the suffix (null) to say "Here's a decent place to end our story."

Markov Algorithm – generate text

set w1 and w2 to the first two words in the textprint w1 and w2

loop: randomly choose w3, one of the

successors of w1 and w2 (in sample text)

print w3

replace w1 and w2 by w2 and w3

SampleSay the current prefix is (me, your)

1. Go to our dictionary, find the entry2. Choose a random word from the

suffix list (say, "tables")3. Write that word out4. Make a new prefix: ("your",

"tables")Repeat, until we choose the suffix (null), or decide we've output enough words

ImplementationSee lecture outline for links to 4 different implementations C (see Makefile) C++ Java Perl Python

What are the pros and cons of the different implementations?

The Data Structures

Python and Perl have everything we need built inJava and C++ provide appropriate containers in their standard librariesIn C we'll need to roll these things ourselves

The Data Structures - Python

The dictionary (dict) is given to usThe prefixes, the keys in the dictionary, will be 2-element tuples (immutable)The satellite data will be stored in a list (an array) If a prefix doesn't already exist in the

table, we insert it, along w/an empty list, [] We just append the new suffix onto the

end of this list

Hash TablesWe need a dictionary to map a prefix to its list of suffices (given a prefix, what are the possible suffices?)In C we'll roll our own hash table The prefix (key) is hashed, returning a

value on [0, N-1], where N is the table size

Each element in the table (array) is a bucket, a linked-list of prefixes

Each prefix is associated w/its own list of suffices

The prefix in C

Prefix – stored in an array of strings (ptrs to char):char* pref[ NPREF ]; NPREF – the # of words in a prefix, a

global constant Note, each string exists once in

memory (after call to strdup); various objects just store pointers to these strings.

Hash table entries in C - State

Use chains (linked list of entries) to handle collisionsEach entry (or State) is a node in a linked-list. It contains: The key (the prefix) The satellite data (pointer to a list of suffix)

A pointer to the next Statepref[0]

suf

next

pref[1]

Satellite Data – C

More than one suffix may be associated w/a given entry (State)Stored in a linked listList made of structs, of type Suffix: char* word, the suffix (word) itself pointer to another Suffix

word

next

word

next

The hash table in C

statetab – the table itself Use chains (linked list of States) to

handle collisions statetab is an array of pointers to

States (to the beginning of a linked list)

The hash table – C

“me”

“your”pref[0]

suf

next

pref[1]

a state:

pref[0]

suf

next

pref[1]

another state(same hashkey):

word

next

word

next

a suffix:

another suffix:

“flowcharts”

“tables”

C - eprintfThese are simply some user-defined functions (mostly wrappers) to help w/the coding:

void eprintf( char *, ... ); prints an error msg to stderr, then exits

void weprintf( char *, ... ); prints a warning to stderr; doesn't exit

char* estrdup( char *s ); calls strdup( s ) to dynamically allocate

memory and copy s; exits if malloc fails

C – eprintf (cont.)void* emalloc(size_t n);

Calls malloc(n). Exits upon failurevoid* erealloc(void *p, size_t n);

Calls realloc( p, n ). Exits upon failure

void setprogname(char *);If set, uses program name in messages

char* progname(void);

memmoveMoves (low-level) a block of memorymemmove( d, s, n ) Moves n-bytes, starting at s (source) to t

(target)Given prefix (w1, …, wn-1), with suffix wn:memove(prefix, prefix+1,(NPREF-1)*sizeof(prefix[0]) );

prefix[NPREF-1] = suffix;

prefix is now ( w2, …, wn ) Just slides everybody down 1, to

make/get next prefix

Choosing a suffixConsider this line, in generate():if ( rand() % ++nmatch == 0 ) We walk the list of suffices For the first one, we mod by 1 (100% chance of

choosing this one) At the next one, we mod by 2 (50% chance of

choosing this one) At the 3rd, 1 in 3 At the 4th, 1 in 4 …

Convince yourself that each suffix has equal probability of being chosen.

Design and Implementation* Objective: To design and implement a program for a relatively small yet...

Documents

Transcript of Design and Implementation* Objective: To design and implement a program for a relatively small yet...