Design and Implementation* Objective: To design and implement a program for a relatively small yet...
-
Upload
kerrie-garrison -
Category
Documents
-
view
216 -
download
1
Transcript of Design and Implementation* Objective: To design and implement a program for a relatively small yet...
Design and Implementation*
Objective: To design and implement a program for a relatively small yet reasonably complicated problem. To introduce and review a variety of implementation languages and to have students review the pros and cons of different implementation choices and languages.“Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious”
– Frederick P. Brooks, The Mythical Man Month
*The examples in these slides come fromBrian W. Kernighan and Rob Pike,The Practice of Programming, Addison-Wesley, 1999.
Themes“Once the data structures are laid out, the algorithms tend to fall into place, and the coding is comparatively easy”The choice of programming language is relatively unimportant to the overall design.Comparing implementations demonstrates how languages can help or hinder, and ways in which they are unimportant.
Topics
Problem: Generate random English text that reads well.Program: some data comes in, some data goes out, and the processing depends on a little ingenuity.Implementations: C, C++, Java, Perl
Failed Attempts
Generate random letters (even with proper frequency).Generate random words chosen from a dictionaryNeed a statistical model with more structure (frequency of phrases)
The Markov Algorithm - Learn
"Learn" from input:Look at all n-word phrases (prefixes); Keep a list of words that follow (one-word
suffices) for each prefix
Store the prefix/suffix-list in a dictionary The key (entry) is the prefix The satellite data, associated w/each prefix,
is the list of suffices
Note, we're "faking" a multi-map. Each prefix has, potentially, several possible suffices.
SampleThe following example uses a subset of Dr. Brooks' quoteUses a prefix length of 2 wordsWe won't bother w/stripping out punctuationWe won't worry about capitalisation (so, "We" and "we" are different strings)We will use a special (Null, Null) prefix to indicate the start.
Sample Markov Algorithm (partial list)
Input prefix Suffix words that follow(null) (null) show
(null) show me
show me your
me your flowcharts tables
your flowcharts and
flowcharts and conceal
your tables and and
will be mystified. obvious.
be obvious (null)
Sample (notes)
We don't filter out duplicates E.g., "and" appears twice for the
prefix ("your", "tables") This is fine. "and" is, from the input,
a nice word to follow. It should be a more probable choice
We also use the suffix (null) to say "Here's a decent place to end our story."
Markov Algorithm – generate text
set w1 and w2 to the first two words in the textprint w1 and w2
loop: randomly choose w3, one of the
successors of w1 and w2 (in sample text)
print w3
replace w1 and w2 by w2 and w3
SampleSay the current prefix is (me, your)
1. Go to our dictionary, find the entry2. Choose a random word from the
suffix list (say, "tables")3. Write that word out4. Make a new prefix: ("your",
"tables")Repeat, until we choose the suffix (null), or decide we've output enough words
ImplementationSee lecture outline for links to 4 different implementations C (see Makefile) C++ Java Perl Python
What are the pros and cons of the different implementations?
The Data Structures
Python and Perl have everything we need built inJava and C++ provide appropriate containers in their standard librariesIn C we'll need to roll these things ourselves
The Data Structures - Python
The dictionary (dict) is given to usThe prefixes, the keys in the dictionary, will be 2-element tuples (immutable)The satellite data will be stored in a list (an array) If a prefix doesn't already exist in the
table, we insert it, along w/an empty list, [] We just append the new suffix onto the
end of this list
Hash TablesWe need a dictionary to map a prefix to its list of suffices (given a prefix, what are the possible suffices?)In C we'll roll our own hash table The prefix (key) is hashed, returning a
value on [0, N-1], where N is the table size
Each element in the table (array) is a bucket, a linked-list of prefixes
Each prefix is associated w/its own list of suffices
The prefix in C
Prefix – stored in an array of strings (ptrs to char):char* pref[ NPREF ]; NPREF – the # of words in a prefix, a
global constant Note, each string exists once in
memory (after call to strdup); various objects just store pointers to these strings.
Hash table entries in C - State
Use chains (linked list of entries) to handle collisionsEach entry (or State) is a node in a linked-list. It contains: The key (the prefix) The satellite data (pointer to a list of suffix)
A pointer to the next Statepref[0]
suf
next
pref[1]
Satellite Data – C
More than one suffix may be associated w/a given entry (State)Stored in a linked listList made of structs, of type Suffix: char* word, the suffix (word) itself pointer to another Suffix
word
next
word
next
The hash table in C
statetab – the table itself Use chains (linked list of States) to
handle collisions statetab is an array of pointers to
States (to the beginning of a linked list)
The hash table – C
“me”
“your”pref[0]
suf
next
pref[1]
a state:
pref[0]
suf
next
pref[1]
another state(same hashkey):
word
next
word
next
a suffix:
another suffix:
“flowcharts”
“tables”
C - eprintfThese are simply some user-defined functions (mostly wrappers) to help w/the coding:
void eprintf( char *, ... ); prints an error msg to stderr, then exits
void weprintf( char *, ... ); prints a warning to stderr; doesn't exit
char* estrdup( char *s ); calls strdup( s ) to dynamically allocate
memory and copy s; exits if malloc fails
C – eprintf (cont.)void* emalloc(size_t n);
Calls malloc(n). Exits upon failurevoid* erealloc(void *p, size_t n);
Calls realloc( p, n ). Exits upon failure
void setprogname(char *);If set, uses program name in messages
char* progname(void);
memmoveMoves (low-level) a block of memorymemmove( d, s, n ) Moves n-bytes, starting at s (source) to t
(target)Given prefix (w1, …, wn-1), with suffix wn:memove(prefix, prefix+1,(NPREF-1)*sizeof(prefix[0]) );
prefix[NPREF-1] = suffix;
prefix is now ( w2, …, wn ) Just slides everybody down 1, to
make/get next prefix
Choosing a suffixConsider this line, in generate():if ( rand() % ++nmatch == 0 ) We walk the list of suffices For the first one, we mod by 1 (100% chance of
choosing this one) At the next one, we mod by 2 (50% chance of
choosing this one) At the 3rd, 1 in 3 At the 4th, 1 in 4 …
Convince yourself that each suffix has equal probability of being chosen.