Paul Hoffman Depart:Irent of Canputer Science[Scarne 731 discusses some basic strategy. 3 moving,...
Transcript of Paul Hoffman Depart:Irent of Canputer Science[Scarne 731 discusses some basic strategy. 3 moving,...
-
5 File No. UIUCDCS-F-85-93l
MEL A learning Program that Improves
by Experience in Playing the Garre of MILL
Paul Hoffman
Depart:Irent of Canputer Science
University of Illinois
at Urbana-champai.gn
January 1985
ISG 85-2
http:Urbana-champai.gn
-
ACKN~
This project was made possible through the help and support of
several people. Primary thanks go to my advisor, Dr. R. S. Michalski,
for allowing m= the freedan to develop my own project. The project
is an application of his basic research in machine learning.
Dr. Claude Samrut introduced m= to the joys of PRO:U::X:; and encouraged
this work in its infancy as a class project. I am indebted to
several members of the Intelligent Systems Group at the University
of Illinois. Chief arrong these are: Tan Channic for help with
his PROI.J:::GRAPHICS package, Tony Nowicki for system support and
Bruce Katz for valuable discussions about GEM and game-playing in
general.
I am grateful for the errotional and financial support of my
parents. I couldn't have done it without them.
This research was supported, in part, by the National Science
Foundation under grant OCR 84-06801 and the Office of Naval Research
under grant NOOO l4-82-K-0186.
-
iv
TABLE OF CONTENTS
1. INTRODUCTION .................................................................................................................................. 1
2. THE GAME OF MILL ..................................................................................................................... ...... 2
3. THE LEARNING PROCESS ................................................................................................................ 4
3.1 Learning by Example ............................................................................................................ 4
3.1.1 Recording Events .............................................................................................. 4
3.1.2 Events from Observation ................................................................................. I)
3.1.3 Events rrom a Teacher ..................................................................................... 6
3.1.4 Events from Experimentation .......................................................................... 7
3.2 Codifying Experience ............................................................................................................. 7
4. GEM ........................................................................................................................................................ 10
APPEI\'D[x A: USER'S GUIDE ........................................................................................................ 12
REFERENCES ..................................................................................................................................... 18
-
1
1. INTRODUCTION
Most programs for playing non-trivial competitive games use some variant of the minimax algo
rithm, first suggested in the early 1950's by Claude Shannon. Move selection is accomplished by gen
erating a tree or moves, replies to those moves, replies to the replies, and so on. The best move is
assumed to be the one which leads to the best position at some arbitrary depth in the tree. For such
programs, the quality of play depends on how much of the tree can be generated and evaluated, given
certain time and/or space restrictions. The epitome of this approach is Belle !Condon and Thompson
82], the current world computer chess champion, which examines just under three million positions in
the average three minutes per move it is allotted in tournament play.
The few human players who can defeat Belle typically examine no more than 100 positions per
move. They rely instead on a vast knowledge or the game. Endowing a program with such knowledge is
a difficult task. The basic problem is one or knowledge acquisition. Human experts (in various fields)
often have difficulty expressing exactly how and why they arrived at a particular decision. [Michie 8::!j
illustrates the problem with the story or a cheese factory famous ror its camemberts .
. . . every hundredth cheese was sampled to ensure that the production process was still on the narrow path separating the marginally unripe from the marginally over-ripe. Success rested on the uncanny powers developed by one very old man, whose procedure was to thrust his index finger into the cheese, close his eyes and utter an opinion. H only because of the expert's age and frailty, automation seemed to be required, and an ambitious R&D project was launched. After much escalation of cost and elaboration of method, no progress had been registered. Substantia! inducements were offered to the sage for a. precise account of how he did the trick. He could offer little, beyond the advice: "It's got to feel right~" In the end it turned out that fee) had nothing to do with it. After breaking the crust with his finger, the expert was interpreting subliminal signals from his sense of smell.
This paper describes an attempt at constructing a knowledge-based player for the board game f..1i11.
The program, MELl, acquires its knowledge in much the same way a human player would - from a
teacher, by observing games or by playing games itself. This knowledge is recorded in the form or exam-
pies of play. When a number or examples have been assimilated, MEL invokes program GEM to induce
rules of play from the examples. The induced (or learned) rules are generalizations of the examples.
1 The na.me MEL is composed of the first, second and third letters, re5pectively, of Machine Learning (of) Mill.
-
Finally, l'v1EL translates and reorganiz.es the learned rules so that they can be used by the program to
play the game.
2. THE GAME OF MILL
Mill2 is an old game, having been played by the ancient Greeks. It derives its name from the
repetitive moving of a player's pieces (stones) to grind down an opponent. In England, the game is
known as Nine-Men Morris after its resemblance to "morris" (Moorish) dances.
Mill is played on the board shown in Figure 1. The players, White and Black, are each equipped
with nine playing pieces of their color. The play can be partitioned into three distinct phases: placing,
1-------2-------3
I
4----5----6
I
7-8-9
I I
10-11-12 13-14-15
I I
16-17-18
I
19----20----21
I
22-------23--------24
Figure 1. The Mill board with labeled junctions.
2 [Morehead and Mott-Smith 761 give a brief hi3tOry. [Scarne 731 discusses some basic strategy.
http:reorganiz.es
-
3
moving, and flying. The game begins with an empty board in the placing phase. The players (White
first) alternately place their pieces at anyone of the 24 junctions where two lines intersect, provided no
piece has yet been placed there. When all pieces have been placed the game enters the moving phase.
Players now move their pieces along the lines to adjacent unoccupied junctions. The object of this
maneuvering is to align three pieces of the same color on the same line. Such an arrangement is called a
mill and the player who forms one is entitled to remove one opposing piece, provided that that piece is
not itselC part of a mill. Once a mill has been Cormed and a piece removed, it may on a subsequent turn
be opened by moving one piece off the Jine. It the miU is then reCormed (closed), another piece may be
removed. Much oC the strategy in the moving phase involves opening and closing mills at the right
times. When a player has been reduced to three pieces, he enters the Hying pha.se. He is no longer res
tricted to moving between adjacent junctions, but may move to any empty junction. A player loses
when he has been reduced to two pieces, or when he cannot move. Games between good players usually
end in draws.
Variations of the game exist. Most involve the flying phase, either eliminating it entirely or vary
ing the number of pieces with which a player may Oy. In some games a piece may be removed from a
mill if there are no other choices. Go-bang is a related game in the family of Go games.
For the purposes of the program, removing was added as a fourth phase. A player enters this
phase when he has formed a miH and returns to his previous phase upon capturing an opposing piece.
Due to the symmetry of the board, there are only four move type8. They correspond to moves at
junctions 1, 2, 4, and 5 and are called t1, t2, l3, and l4 event8, respectively. (The l stands for lype.)
The program makes extensive use of this symmetry, using it to reduce the amount of data and complex
ity of the rules. For example, opening moves to junctions 1, 3, 7, 9, 16, 18, 22 or 24 (all tl moves) are
considered identical and are treated internally as if the move had been made to junction 1. All of this is
invisible to the user however, who sees board pieces at the junctions they were placed.
-
4
3. THE LEARNING PROCESS
When certain principles of play are well known and straightforward, it is useful to sidestep learning
and provide this knowledge in some direct manner. .MEL began as a class project in just this way, using
static, programmed rules for play. The only way to increase the level of play was to write more (or
better) rules. The advantage of rote learninl is the speed with which the knowledge is gained. The
disadvantages are the difficulties of preparing such knowledge initially and modifying it later. 4
3.1. Learning by Example
The method of learning which .MEL uses is learning by example. The examples are moves to or
from a junction and are called events. Moves made during the placing or removing phases are each
represented by a single event. Moves made during the moving or flying phases are represented by two
events, one for the "from" portion and one for the "to" portion. Each event is a set of attributes. Attri
butes are such facts as the color of the moving player and the colors of each of the 24 junctions. Events
representing the "from" portion of a moving or flying move have an additional attribute indicating the
junction from which the piece was moved.
?viEL organizes events according to move type and event type. Event types correspond to the
game phases except that moving and flying are subdivided into their "from" and "to" portions. The
event types are denoted p, r, ml, m2, 11 and 12 for placing, removing, moving from, moving to, Hying
from and flying to respectively. Events of each type are further divided according to move type (tl, t2,
tB and t4).
3.1.1. Recording Events
Recording an event is a complicated process. The board is first reeoriented (normalized) so that
the junction of the move (from or to) becomes one of the junctions 1, 2, 4 or 5. A move to junction 3,
Jrhe classIfication of learning into: rate learning, ltarning by being told, learnang by analogy, learning from uamples and learning by obsert/alion and discovery is due to [Carbonell, Michalski and Mitchell 83].
~ote learning still plays a small part in MEL. The programmed rules are still present in the guise of the machine player
-
5
for example, causes the board to be rotated 90 degrees counterclockwise so that junction 3 is in the posi
tion of junction 1. The normalized event must then be checked to make sure it is not a duplicate of one
already recorded. Events which differ merely in orientation are considered duplicates. Then the event
must be checked for consistency. The database of events is consistent if each event belongs to only one
of the four classes (tl, 12, 13 or 14). Inconsistencies arise when the same event is classified as say, both a
t1 and a 12 event. The implicit assumption is that for any given position (set of attributes) only one
move (c1a~sifintion) is correct. If an event is consistent and unique, it is recorded. It a move is incon
sistent, the original classification can be allowed to stand or the event can be reclassified.
!v1EL obtains experience in the form of even ts from three different sources. It can observe a game
and record the moves of either or both players. It can also be given specific examples by an external
expert or teacher. Fin ally, !v1EL can provide its own examples by playing games and recording those
moves which lead to some desired outcome. The next three sections deal with these sources individually.
3.1.2. Events trom Observation
~lEL allows six player types to compete in a game. Observing the moves of either or both is one
way f..ffiL gains experience. The player should be consistent in his moves (and will be informed if he is
not). The more skillful the player, the better experience !v1EL will gain.
A human is one player type which can be observed. Moves are input via a mouse. If a move is
inconsistent with a previous move, the human player determines which is correct. A skilled human
player provides the best examples from observation. The drawback is that playing the number of games
required to provide a good set of examples requires a good deal of patience.
Another observable player is the machine player which uses a programmed (not learned) set of
rules to generate moves. This player is not highly skilled, but it removes the tedium of human move
generation and is suitable for obtaining examples of reasonable (if not brilliant) play. The machine
player does fairly well in the placing phase and so is most useful for providing examples of play from
type. Also, MEL ha.s been progra.mmed with the definitions of lega.l moves.
-
6
that phase. It plays quite poorly in the other phases and the examples gained are oC little value. The
reason Cor the poor play is poor rules. Good rules Cor the moving and flying phases are particularly
difficult to write, and this difficulty was the main inspiration for a program which could acquire
knowledge automatically. When the machine player generates an inconsistent move, the original move is
sustained.
The teamer player type can also be observed. This player is identical to the human player, except
that a list oC moves generated Crom the learned rules is provided. The main value of this player type is
Cor fine-tulllDg learned rules. For large sets oC learned rules however, generating the list oC moves is
quite time-consuming.
The/earned player is the final player which can provide examples by observation. This player gen
erates moves using the learned rules. The only reason to observe this player is to create a more robust
set oC examples. This should only be done when the learned rules are of good quality.
A player which cannot be observed is the random player. The random player generates legal
moves, but since its play is inconsistent, observing it would be of little value. It does make a useful
opponent Cor an observed player, generating moves which may be unsound, but which would never be
encountered in a game between skilled players. The experimenter is the other player type which cannot
be observed. This is because moves made by this player are not added to the database of events unless
they lead to a favorable board position. Moves are generated from the learned rules. The experimenter
player can be thought of as a learned player which is observed only when it is doing well. It will be dis
cussed further in Section 3.1.4.
3.1.3. Events from a Teacher
It was noted in Section 3 that direct implantation of knowledge is oCten useful. MEL allows exam
ples oC play to be presented directly. Events obtained in this manner are allowed to have a junction
color attribute of "don't care" in addition to the usual white, black or empty. Suppose tbe teacher
wishes to provide an example of white completing a mill to help ~fEL learn that concept. An example
-
7
which might be provided is an empty junction 1 with junctions 2 and 3 white. The colors of the other
junctions are of no interest and can be valu.ed as don't-care. Examples provided by a teacher help
reduce the number of events which must be recorded when a player is observed during a game. If the
player makes a move which matches, attribute for attribute, a teacher provided example, MEL can
disregard the observed example since there is no new knowledge to be gained. (The don't-care matches
any value.)
3.1.4. Events from Experimentation
The experimenter player type allows MEL to provide its own examples. Instead of recording
moves as events as soon as they are observed, they are recorded as temporary events called Cevent8.
When a favorable position is reached, a count associated with the t_event is incremented. When this
count reaches a certain threshhold, the t_event is rerecorded as a regular event. If the threshhold were
two, for example, no event would be recorded until it had twice led to a favorable position. (The idea of
thresh holds is due to R.S. MichalskL) A high value for the threshhold provides high quality events at a
slow pace. A low value provides many events of lesser quality. Definitions of favorable positions must
be provided, but they can be as simple as a won game.
3.2. Codifying Experience
Having assimilated a number of events, tvfEL has a rather useless assemblage of knowledge. en less
a situation is encountered for which an event has already been recorded (an improbable event), tvfEL's
experience is of no value. What is needed is some method of transferring the specific knowledge of the
events to a more general form which can be applied to new situations. Specifically, once we have a set of
events in which a corner move (t1) is appropriate, we want a description ot this set which includes all of
the t1 events and no events trom the other three classes.
tvfEL transtorms its knowledge using instance-to-class generalization. After a number of events
have been collected, MEL uses program GEM to generate generalized rules for each event type. These
-
8
tl-event5 # c 51 52 53 54 55 56 s7 58 1 w e w w b e b b b 2 w e w w e w e e w 3 w e w w e b e e b
t2-event5 # c 51 52 53 54 s5 56 s7 58 1 w w e w e b e b e 2 w w e e b w e e w 3 w w e w w b w e e 4 w e e e e w e e w
t3-events # c 51 52 53 54 55 56 57 58 1 w e e e e w w e e
t4-events # c 51 52 53 s4 55 s6 s7 s8 1 w w b b e w e w e 2 w w b w e w e e e
Figure 2. Example GEM input.
rules describe the events of each move type and distinguish them from the events of the other three
move types.
An example will make this clear. Figure 2 shows GEM input in the form of a relational table. The
table represents a collection of events of type placing, grouped by move type. Each row is an event and
each column contains values for a particular attribute. The attributes in the column labeled c are the
colors of the moving player (white or black). The other attributes are the colors (white. black or empty)
of the board junctions. The events in this example all show white placing to complete a mill (the bold
faced attributes). Thus, event #1 of the tl even.ts shows white moving to junction 1 when junctions 2
and 3 are white (which completes a white mill at 1-2-3). Only the first eight board junctions are shown
-
although in actual practice all 24 would be present.
Figure 3 shows some possible GEM output ror the example in Figure 2. A set or disjuntive com
pleus form the descriptions ror each class. The complexes are formed from conjunctive selectors (the
bracketed expressions). Thus, the description of a t2 move in English is: (1) Junction 2 is empty and
junctions 1 and 3 are white, or (2) junction 2 is empty and junctions 5 and 8 are white.
The classification rules produced by GEM are generalizations because they describe more situations
than did the original events. The salient features of t1 events, according to GEM, are that junction 1 is
empty and junctions 2 and 3 are white. In addition, this description does not describe events or any
other move type. If a new event is encountered which fits this generai description, the general rule can
be used to decide that a move to junction 1 is appropriate. If a new event is encountered which (accord
ing to the rules) belongs to more than one class, the rules are too general. If the new event is described
tl-outhypo #: cpx 1 [sl=eJ [s2=w] [s3=w]
t2-outhypo #: cpx 1 [sl=w] [s2=e] [s3=w] 2 [s2=e] [s5=w] [s8=w]
t3-outhypo #: cpx 1 !s4=e] [s5=w]ls6=w]
t4-outbypo #: cpx 1 [s2=wJ [s5=e] [s8=w] 2 [s2=wl [s5=eJ [s8=wl
Figure 3. Example GEM output.
-
10
by none of the rules, the rules are too specific. In either case, the new event should be added to the
database of events and GEM rerun.
~1EL does not use GEM output directly. The rules are translated into a single Prolog statement.
The Prolog statement is a representation of an optimal binary tree Cor evaluating which complexes are
satisfied (and thus which move to make). The nodes of the tree are the selectors and the branches
represent whether or not the node was satisfied. The selector which appears at the top oC any subtree is
the selector which appears in the most complexes represented by that subtree. Therefore, the root node
of the tree (which represents all complexes) is the selector which appears most often. In the example, the
root node is the selector [s2=w] since it appears in three complexes. The left subtree under the root
represents the rules in which [s2=w] appears and the right subtree those in which it does not appear.
With this arrangement, the number or times each complex must be evaluated is minimized. The learned
rules apply to a board in the normal orientation. The board may need to be reeoriented to check for all
rules which can be satisfied.
4. GEM
GEM (or more precisely GEMl.O [Reinke 84]) is the latest in a series oC induction programs
developed by the Intelligent Systems Group at the University of Illinois at Urbana-Champaign. The AQ
algorithm is at the heart of the various versions or GEM. Briefly, the AQ algorithm produces descrip
tions of classes oC events. Each event is a vector of attribute values. The attribute values are discrete
and belong to finite domains. For an easy to understand explanation oC how the algorithm works, con
sult [Reinke 841. For a detailed theoretical discussion see [Michalski 75].
GEM provides input and output Cor AQ which is geared ror use by a knowledge engineer. As men
tioned earlier, GEM input is in the Corm oC relational tables and output is in the Corm oC variable-valued
logic expressions. Additional input information regarding attributes and their domains can be provided.
GEM allows a cost to be associated with each attribute. This can be viewed as the cost for evaluating
the attribute or as a measure or importance oC an attribute, a lower value implying less cost or more
-
11
importance. LEF's (Lexicographical Evaluation Functions) tell GEM what type of complexes to form.
An LEF can specify short complexes, long complexes, or complexes utilizing the most important attri
butes. ~1EL takes advantage of this and places greater importance on the junctions which are closest to
the normalized junctions 1, 2, 4 and 5. This provides a focus of attention in the rules neat the move
junctions. The attributes which are cheapest to evaluate are the color of the moving player and the
"from" junction of a moving or flying move. ~L uses LEF's which dictate short, cheap complexes con
sisting of junctions near where a move is to be made.
-
12
APPENDIX A
USER'S GUIDE
MEL is written in UNSW Prolog [Sammut 83j and uses PROLOGRAPHICS [Channic 83] (or
graphics on a Sun Microsystems display. A stripped-down non-interactive version with no display was
used (or lengthy batch runs.
MEL is loaded (rom a window on the sun by entering:
sprolos -s300000 sunmlll [event-file] [rule-file]
The request for such a huge amount of stack space (-s300000) is unnecessary i( no GEM output will be
translated to Prolog. A request of 50000 units will suffice if no translation is to be done. Due to the
program's size, MEL takes about a minute to load. Optional files for events and learned rules may fol
low the program file and will extend the load time.
~10st input is accomplished through a three-button mouse. Prompts are provided as to which but
ton to push. Multiple choice questions with more than three alternatives are selected from menus by
positioning the mouse over the desired choice and pressing any mouse button. Moves are selected in a
similar manner by positioning the pointer at or near the desired junction or piece. UNSW Prolog com
mands must be entered (rom the keyboard. Remember that the sunwindow package requires the mouse
pointer to be in the text window for keyboard input.
After MEL has finished loading, you will be prompted for gprolog commands. The command to
begin playing is:
play-same!
You will then be asked whether you want instructions. An affirmative answer scrolls a brief description
of the game in the text window. The next thing to decide is wh at types of players should compete from
-
13
a menu of the six possible choices. Refer to the descriptions of the player types in Section 3.1.2 to help
you decide. you decide. Finally, you will be prompted as to whether either player's moves should be
recorded. Remember that the moves of player types random and experimenter cannot be recorded. Play
begins after this information has been entered.
If neither player requires human input, the game will progress on its own until one player wins or
50 moves have been made by each side. Players requiring human input (types human and learner) indi
cate their moves by placing the mouse pointer over the desired junction or piece and pressing a mouse
button. If a player has a single choice of move, the move will be selected automatically. On any player's
move, the single-item menu labeled "RETRACT MOVE" may be selected. This action returns the game
to the state it was in before that player's last move. Events (regular and temporary) which were
recorded for retracted moves are also erased. If the player was selecting the "to" portion of a moving or
flying move, RETRACT MOVE returns the game to the start of the current move. After the game has
concluded, you will again be prompted for gprolog commands.
In addition to the regular UNSW Prolog commands for editing files, etc., MEL has several useCul
commands which can be entered from the keyboard. They are listed below with a brief description.
play...game! begin a game.
stops a game (or any other action). This is not usually dangerous unless it is done while something is being drawn on the screen.
learntng(yes)! gives you the menu Cor learning. See below for details.
save_events(file)! saves all regular events in file.
clear_events! removes all regular events.
saves all temporary events in file.
removes all temporary events.
saveJules(file)! saves learned rules in file.
-
14
example! allows the input of examples. See below for details.
To provide specific examples of play, enter (in the text window):
example!
You will be asked to select the game phase for which your example applies from a menu. After you have
chosen the phase, a board will be drawn with a question mark at each junction. The question marks
indicate that the color of that junction is irrelevant to the example and may have any value. Fill in the
junctions that are relevant by positioning the mouse pointer at the junction and pressing button I for a
white piece, 2 for an empty junction, and 3 for a black piece. The position which you create must have
at least one legal move for the chosen phase. (For placing, moving or fiying, this means you must
specify at least one empty junction.) You may change the colors of junctions as many times as you wish.
When you are finished, select the single item menu labeled "END". If there is more than one legal move
for the example you set up, you will be asked to input that move. This is done in the same way as a
move in a game. You will then be asked whether another example is to be entered or to end example
input.
After MEL has acquired some examples of play either by observing some games or from specific
examples you have provided, learned rules can be generated by entering:
learning!
You will be asked whether you want to perform batch or incremen tal learning. Batch learning generates
learned rules from scratch. Incremental learning uses existing GEM output to aid in generating the new
rules. If no GEM output exists, the two types of learning will work identically. If GEM output does
exist, incremental learning will be much faster, especially if there are only a few new events since the last
rule generation. If in doubt, use incremental learning. GEM output is stored in files of the form
• ...,gemout where the "." is p, r, mI, m2, fl or f2. Everything else is automatic. MEL will keep you
informed of what it is doing (running GEM, translating). To save learned rules enter:
-
15
s&veJules(filename)!
where filename is the name of the file to which the learned rules are to be written. The Prolog version
of a learned rule is highly unreadable. Use the GEM output directly to understand what the rules mean.
The program consists of 21 modules, each consisting of about 100 lines of code. The names of the
modules and a brief description of each are listed below.
Module experiment contains the routines for conducting experiments in play. It contains routines
for manipulating temporary events (t_events) and recognizing good positions. The threshhold value for
the recording of temporary events is defined here and may be changed. Three definitions of good posi
tions are given, but the user may wish to add others.
Module gemsetup contains the routines for setting up and running GEM. It processes events from
the database into the relational table format which is the input to GEM. It also contains the routines
for writing learned rules to a file.
Module integrity is responsible for maintaining the integrity of the database of events. EH'nts
must belong to a single class and must be unique.
Module miscl contains various low-level routines concerned with list manipt»ation. Module misc2
contains other low-level routines which are used by other modules.
Module orient contains definitions of the sixteen possible board orientations with routines (or
changing from one orientation to another.
Module recordl contains routines for maintaining the database of events including writing events
to a file, erasing ev~nts and recording events. The higher level functions of deciding whether an event
should or should not be recorded are in modules integrity and sunrecord.
Module rules contains the hard-coded rules used by the machine player. Rules for moving at ran
dom and using human input are also expressed by rules in this module.
-
16
Module stl'uctl contains the definitions of basic board structures (junction, row, mill) used by
other modules. Module stl'uet2 contains higher level structures used only by the machine player.
:t-.10dule sundl'aw contains the routines for drawing and undrawing screen objects via the PROLO·
GRAPHICS package. It also contains routines for converting screen coordinates to screen object
numbers.
Module sunexample contains routines for accepting and processing example moves.
Module sungame contains the routines for conducting a game. These include routines for initializ·
ing, moving and determining when a game is over.
Module sunhuman is responsible for accepting and checking human input via the mouse such as
moving and selection from menus. It also contains routines for adjusting the game when a move has
been retracted.
Module sunleal'n contains the routines for controlling batch and incremental learning. The actual
routines for running GEM and interpreting GEM output are in modules gemsetup and translate respec
tively.
Module sunmillioads all of the other modules and prints the introductory message.
Module sunmove contains the routines for controlling move generation. It also handles removing
pieces and tracking moves for possible retraction.
Modules sunobjl and sunobj2 contain definitions of screen objects. In general, objects defined in
sunobjl are simpler than those defined in sunobj2. Objects in sunobjl are used in defining objects in
sunobj2.
Module sunl'ecol'd controls the recording of moves. It also informs the user of inconsistent moves
when they are made. The actual checking, erasing and recording of the moves is done in modules
record 1 and integrity.
-
17
Module translate processes GEM output in the form of variable-valued logic expressions into a
Prolog statement which can be used by MEL. It contains routines for parsing GEM output, translating
selectors and arranging the translated selectors in an optimal tree structure.
Module valid contains the definitions or valid moves.
Prolog is nearly self-documenting and the interested reader is invited to examine the program list
ing for specifics on MEL's operation. Every attempt has been made to make variable and predicate
names mnemonic.
MEL uses several external files. Binary files eX8creen and jscreen contain definitions of the exam
ple screen and the labeled board junctions, respectively. They can be generated by MEL dynamically.
but this is slow. Events and their corresponding GEM rules are written to files of the form '*_et'enls
and '*_gemou/, respectively, where '* is p, r, ml, m2, /1, or /2. Prolog learned rules are built in file
lrule...file. When incremental learning is being performed, output from the previous invocation of GE~f
is used as input to the current invocation and is stored in inhypo. File inhypoed contains the stream edi
tor (sed) commands for changing GEM output to GEM input hypotheses. Files gemheadl and gemhead2
contain static GEM input (everything but inhypo tables and event tables), for one and two part moves
respectively.
-
18
REFERENCES
Boulanger, A.B., "The Expert System Plant/CD: A Case Study in Applying the General Purpose Inference System ADVISE to Predicting Black Cutworm Damage in Corn," Report No. UIUCDCS-R-83-1134, Department of Compute Science, University or Illinois, 1983.
Channic, T.,"PROLOGRAPHICS: A Graphics Interface for Prolog", unpublished, 1984.
Hoff, W., Michalski, RS. and Stepp, RE., "INDUCE-2: A Program for Learning Structural Descriptions from Examples," Report No. UIUCDCS-F-83-904, Department of Computer Science, Vniversity of Illinois, 1983.
Michalski, RS., "Discovering Classification Rules Vsing Variable-Valued Logic System VL ," Third1International Joint Conference on Artificial Intelligence, pp. 162-172, 1973.
Michalski, RS., "Synthesis of Optimal and Quasi-Optimal Variable-Valued Logic Formulas", Proceedings of the IfJ75 International Symposium on Multiple Valued Logic, pp. 76-87, 1975.
Michalski, RS., "A Theory and Methodology of Inductive Learning," Machine Learning, Michalski, RS., Carbonell, J. and Mitchell, T. (Eds.), pp. 83-134, Tioga, Palo Alto, CA, 1983.
~fichalski, RS. and Chilausky, RL., "Learning by Being Told and Learning from Examples: an Experimental Comparison of Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis," International Journal of Policy Analysis and Information Systems, Vol. 4, No.2, pp. 125-160, 1980.
'\1ichie, D., "Experiments on the Mechanization of Game Learning," Computer Journal, VoL 25, No.1, pp. 105-112, 1982.
Morehead, A.H. and Mott-Smith, G. (Eds.),Hoyle Up-to-date, Grosset & Dunlap, New York, NY, 1976.
Quinlan, J.R, "Discovering Rules from Large Collections of Examples: A Case Study," Expert Systems in the Micro-Electronic Age, Michie, D. (Ed.), pp. 168-201, Edinburgh Vniversity Press, Edinburgh, 19i9.
Quinlan, J.R, "Learning Efficient Classification Procedures and their Application to Chess End Games," Afachine Learning, Michalski, RS., Carbonell, J. and Mitchell, T. (Eds.), pp. 463-481, Tioga, Palo Alto, CA, 1983.
Reinke, RE., "A Structured Black-to-Win Decision Tree for the Chess Endgame KP vs. KR (pa7)," Internal Report, Intelligent Systems Group, Department of Computer Science, University of Illinois. 1982.
Reinke, RE., "Knowledge Acquisition and Refinement Tools for the ADVISE Meta-Expert System," M.S. Thesis, Department of Computer Science, University of Illinois, 1984.
Scarne, J., Scarne's Encyclopedia of Games, pp. 532-533, Harper & Row, New York, NY, 1973.
Shapiro, A. and Niblett, T., "Automatic Induction of Classification Rules for a Chess Endgame," Advances in Computer Chess, Clarke, M.R. (Ed.), Edinburgh University Press, Edinburgh, 1982.
-
title • tnt 1 'p"
tl.QQthJPo t Cpl 1 (~=b][s2:b] (54=e] [sS:eJ (sl=.,w] 2 [sh.] [she] [5I:e] (sh.1 {stl:w,.] 3 [51:.] [s2=w,b] {54=w,lI] [sS:w] ( [51=e] [53=b] [s6=e] [51=.1 [s2a:w,.J S [c:wJ(53=1] [sS.b] [56:1] {sl:.,wJ £ [s2=w,.] [5S=b] [lh.J 1 [51:1] {5(:w,b] [s5:wl [sl=.J
[51=.1 [s6=el [51=w,1I1 [58=eJ
, [51=1: [I(:W] [56=1,11] [sl=l]
10 [51=e] [54=w] [sS:w] [sl=.,b]
t2·oatbypo , cpa
1 [sl=b] [5S:W] [s6=1I]
1 (sl.e] [s4=',II) (sS •• ,b) [s6=w] [ill:.,bl
3 [c.bl(sl:bJ [s5=w] [s6=e,b]
4 [54=w,b] [sS:b) [s6=w,b) [sl=w,b]
5 (12:.1 [5S:b] (56=e,w] [51:b]
[c=b][sl=bJ [s2=ti [5S=w,lI] {s6=e]
1 (52=.] [s3=e,w] (54=e] [sS.e] {s7=e] {s11:.] [slO=w,b]
I [sl=.,.J [sl=.J Ishw1 [s6:.,b] [sl=w]
, [sl=bJ [sl=e1 [sS:w] [s6=.,111
10 {sl:w,b] [sl=.] [53=w,lI] {shbl [5he,wl
11 [sl=.,IIJ [s4=1] [sS:w] [s6=.,b] {s7.e] [s9... b1 [s20=e]
11 (c="'][51=e,IIJ [54=e] {15=W] {she,bl [s7.... ] £sl1=e] [db.1
13 [51=.,b] [sh.] [54:11] [5"'W; {sbw,lI]
14 {51=.J [53=w] [s(=.,.J [15=11] [s4=.,w] [s7=e] [s11=e]
IS (sS"b] (shwl
U-oathJpo • cpl
1 [54:.] '[sS=wJ [Ihw]
154=e] [sS=b] (shill
3 [52=eJ (s4=e] {IS=.,W] [s6=w] {s?=w,~] [121=w}
4 (she,v] {sh.] {5S::.l [s?:w,b] {shbl
S [il".l [is=.] [s6=.J [sl=v,~l (511=w,bl
[i2=w] [13=.,w] [54:.) [55=.1 [s1=.] {s8;e] (59=w,0] 1 [c=bH51=e,w] [54=.) (sS=vJ [s7=ll [she,bl [s9:wJ [s28:e] 8 t52:v] [54=.] [5S... 1 [16"w,01 [sl=e] , [11=e.w) [53=.,.) [14=.] [sS=w] [57=.] (sh.,b] [5hw,b] [slG=v,b]
1D [c=wJ[51:w,bl [5%:W,O] [5S=V] [s6=.,w] [19".,v) (s11=.,b1
11 (c=v]{s4=.] [5S=.,b1 [56=b] [sl=e] [520=.]
12 [sl=w] [5S=W)
-
t4-olthfPO I cpa 1 l51:.1 [54: •• bJ [s5=11 [S'=IJ (57:IJ [sll:.1 (slD:.] 1 (s4=b] [55:11 (5.=&1 3 [sl=v] [55=1] (s8=w,bl 4 [51:w,bj tsl:l] [5S=I] (5'=IJ [58=e1 (520=.] S [5Z:e1 :54:e,bJ (s5=1] [s'=e1 [57:.l [51=11 {59:w.bJ (510=IJ
{s4:wl [5S:I! [s6=.] [52:w.b1 [55=.] [sl=bl
This ran DSld laillisftOnds of CPU tiat): 5yst.a tiat: 700
180450
-
i
BIBLIOGRAPHIC DATA 1. Report No. 3. Recipient's Accession No. SHEET 1 I~UIUCDCS-F-85-931
14. T ule ana ;,uOtltle S. Report Date Janqruy 1985
in Playing the Game of MILL
MEL - A Learning Program that Improves by Experience
6.
7. Author(s) 8. Performing Organization Rept.No.
P...a.ul ..Ho.f.£man 9. Performing Organization Name and Address 10. Project/Task/Work Unit No.
Department of Computer Science 11. Contract/Grant No.University of Illinois NSF DCR 84-06801
1304 W. Springfield Avenue ONR N00014-82-K-0186Urbana, IL 61801 12. Sponsoring Organization Name and Address 13. Type of Report & Period
Covered Office of Naval Research National Science Foundation Arlington, VA Washington, DC
15. Supplementary Notes
16. Abstracts This paper describes a program able to learn how to play the board game MILL. The
program, called MEL, acquires its knowledge in much the same way a human player would _ from a teacher, by observing games or by playing games itself. This knowledge is recorded in the form of examples of play. When a number of examples have been assimilated, MEL invokes program GEM to induce rules of play from the examples. The induced (or learned) rules are generalizations of the examples. Finally, MEL translates and reorganizes the learned rules so that they can be used by the program to play the game.
17. Key ',l'ords and Document Analysis. 170. Descriptors
Machine Learning
Induction
Game Playing
Learning Rules from Examples
Self-Improvement Programs
17b. Identifiers/Open-Ended Terms
17