Paul Hoffman Depart:Irent of Canputer Science[Scarne 731 discusses some basic strategy. 3 moving,...

5 File No. UIUCDCS-F-85-93l

MEL A learning Program that Improves

by Experience in Playing the Garre of MILL

Paul Hoffman

Depart:Irent of Canputer Science

University of Illinois

at Urbana-champai.gn

January 1985

ISG 85-2

http:Urbana-champai.gn

ACKN~

This project was made possible through the help and support of

several people. Primary thanks go to my advisor, Dr. R. S. Michalski,

for allowing m= the freedan to develop my own project. The project

is an application of his basic research in machine learning.

Dr. Claude Samrut introduced m= to the joys of PRO:U::X:; and encouraged

this work in its infancy as a class project. I am indebted to

several members of the Intelligent Systems Group at the University

of Illinois. Chief arrong these are: Tan Channic for help with

his PROI.J:::GRAPHICS package, Tony Nowicki for system support and

Bruce Katz for valuable discussions about GEM and game-playing in

general.

I am grateful for the errotional and financial support of my

parents. I couldn't have done it without them.

This research was supported, in part, by the National Science

Foundation under grant OCR 84-06801 and the Office of Naval Research

under grant NOOO l4-82-K-0186.

iv

TABLE OF CONTENTS

1. INTRODUCTION .................................................................................................................................. 1

2. THE GAME OF MILL ..................................................................................................................... ...... 2

3. THE LEARNING PROCESS ................................................................................................................ 4

3.1 Learning by Example ............................................................................................................ 4

3.1.1 Recording Events .............................................................................................. 4

3.1.2 Events from Observation ................................................................................. I)

3.1.3 Events rrom a Teacher ..................................................................................... 6

3.1.4 Events from Experimentation .......................................................................... 7

3.2 Codifying Experience ............................................................................................................. 7

4. GEM ........................................................................................................................................................ 10

APPEI\'D[x A: USER'S GUIDE ........................................................................................................ 12

REFERENCES ..................................................................................................................................... 18

1

1. INTRODUCTION

Most programs for playing non-trivial competitive games use some variant of the minimax algo

rithm, first suggested in the early 1950's by Claude Shannon. Move selection is accomplished by gen

erating a tree or moves, replies to those moves, replies to the replies, and so on. The best move is

assumed to be the one which leads to the best position at some arbitrary depth in the tree. For such

programs, the quality of play depends on how much of the tree can be generated and evaluated, given

certain time and/or space restrictions. The epitome of this approach is Belle !Condon and Thompson

82], the current world computer chess champion, which examines just under three million positions in

the average three minutes per move it is allotted in tournament play.

The few human players who can defeat Belle typically examine no more than 100 positions per

move. They rely instead on a vast knowledge or the game. Endowing a program with such knowledge is

a difficult task. The basic problem is one or knowledge acquisition. Human experts (in various fields)

often have difficulty expressing exactly how and why they arrived at a particular decision. [Michie 8::!j

illustrates the problem with the story or a cheese factory famous ror its camemberts .

. . . every hundredth cheese was sampled to ensure that the production process was still on the narrow path separating the marginally unripe from the marginally over-ripe. Success rested on the uncanny powers developed by one very old man, whose procedure was to thrust his index finger into the cheese, close his eyes and utter an opinion. H only because of the expert's age and frailty, automation seemed to be required, and an ambitious R&D project was launched. After much escalation of cost and elaboration of method, no progress had been registered. Substantia! inducements were offered to the sage for a. precise account of how he did the trick. He could offer little, beyond the advice: "It's got to feel right~" In the end it turned out that fee) had nothing to do with it. After breaking the crust with his finger, the expert was interpreting subliminal signals from his sense of smell.

This paper describes an attempt at constructing a knowledge-based player for the board game f..1i11.

The program, MELl, acquires its knowledge in much the same way a human player would - from a

teacher, by observing games or by playing games itself. This knowledge is recorded in the form or exam-

pies of play. When a number or examples have been assimilated, MEL invokes program GEM to induce

rules of play from the examples. The induced (or learned) rules are generalizations of the examples.

1 The na.me MEL is composed of the first, second and third letters, re5pectively, of Machine Learning (of) Mill.

Finally, l'v1EL translates and reorganiz.es the learned rules so that they can be used by the program to

play the game.

2. THE GAME OF MILL

Mill2 is an old game, having been played by the ancient Greeks. It derives its name from the

repetitive moving of a player's pieces (stones) to grind down an opponent. In England, the game is

known as Nine-Men Morris after its resemblance to "morris" (Moorish) dances.

Mill is played on the board shown in Figure 1. The players, White and Black, are each equipped

with nine playing pieces of their color. The play can be partitioned into three distinct phases: placing,

1-------2-------3

I

4----5----6

I

7-8-9

I I

10-11-12 13-14-15

I I

16-17-18

I

19----20----21

I

22-------23--------24

Figure 1. The Mill board with labeled junctions.

2 [Morehead and Mott-Smith 761 give a brief hi3tOry. [Scarne 731 discusses some basic strategy.

http:reorganiz.es

3

moving, and flying. The game begins with an empty board in the placing phase. The players (White

first) alternately place their pieces at anyone of the 24 junctions where two lines intersect, provided no

piece has yet been placed there. When all pieces have been placed the game enters the moving phase.

Players now move their pieces along the lines to adjacent unoccupied junctions. The object of this

maneuvering is to align three pieces of the same color on the same line. Such an arrangement is called a

mill and the player who forms one is entitled to remove one opposing piece, provided that that piece is

not itselC part of a mill. Once a mill has been Cormed and a piece removed, it may on a subsequent turn

be opened by moving one piece off the Jine. It the miU is then reCormed (closed), another piece may be

removed. Much oC the strategy in the moving phase involves opening and closing mills at the right

times. When a player has been reduced to three pieces, he enters the Hying pha.se. He is no longer res

tricted to moving between adjacent junctions, but may move to any empty junction. A player loses

when he has been reduced to two pieces, or when he cannot move. Games between good players usually

end in draws.

Variations of the game exist. Most involve the flying phase, either eliminating it entirely or vary

ing the number of pieces with which a player may Oy. In some games a piece may be removed from a

mill if there are no other choices. Go-bang is a related game in the family of Go games.

For the purposes of the program, removing was added as a fourth phase. A player enters this

phase when he has formed a miH and returns to his previous phase upon capturing an opposing piece.

Due to the symmetry of the board, there are only four move type8. They correspond to moves at

junctions 1, 2, 4, and 5 and are called t1, t2, l3, and l4 event8, respectively. (The l stands for lype.)

The program makes extensive use of this symmetry, using it to reduce the amount of data and complex

ity of the rules. For example, opening moves to junctions 1, 3, 7, 9, 16, 18, 22 or 24 (all tl moves) are

considered identical and are treated internally as if the move had been made to junction 1. All of this is

invisible to the user however, who sees board pieces at the junctions they were placed.

4

3. THE LEARNING PROCESS

When certain principles of play are well known and straightforward, it is useful to sidestep learning

and provide this knowledge in some direct manner. .MEL began as a class project in just this way, using

static, programmed rules for play. The only way to increase the level of play was to write more (or

better) rules. The advantage of rote learninl is the speed with which the knowledge is gained. The

disadvantages are the difficulties of preparing such knowledge initially and modifying it later. 4

3.1. Learning by Example

The method of learning which .MEL uses is learning by example. The examples are moves to or

from a junction and are called events. Moves made during the placing or removing phases are each

represented by a single event. Moves made during the moving or flying phases are represented by two

events, one for the "from" portion and one for the "to" portion. Each event is a set of attributes. Attri

butes are such facts as the color of the moving player and the colors of each of the 24 junctions. Events

representing the "from" portion of a moving or flying move have an additional attribute indicating the

junction from which the piece was moved.

?viEL organizes events according to move type and event type. Event types correspond to the

game phases except that moving and flying are subdivided into their "from" and "to" portions. The

event types are denoted p, r, ml, m2, 11 and 12 for placing, removing, moving from, moving to, Hying

from and flying to respectively. Events of each type are further divided according to move type (tl, t2,

tB and t4).

3.1.1. Recording Events

Recording an event is a complicated process. The board is first reeoriented (normalized) so that

the junction of the move (from or to) becomes one of the junctions 1, 2, 4 or 5. A move to junction 3,

Jrhe classIfication of learning into: rate learning, ltarning by being told, learnang by analogy, learning from uamples and learning by obsert/alion and discovery is due to [Carbonell, Michalski and Mitchell 83].

~ote learning still plays a small part in MEL. The programmed rules are still present in the guise of the machine player

5

for example, causes the board to be rotated 90 degrees counterclockwise so that junction 3 is in the posi

tion of junction 1. The normalized event must then be checked to make sure it is not a duplicate of one

already recorded. Events which differ merely in orientation are considered duplicates. Then the event

must be checked for consistency. The database of events is consistent if each event belongs to only one

of the four classes (tl, 12, 13 or 14). Inconsistencies arise when the same event is classified as say, both a

t1 and a 12 event. The implicit assumption is that for any given position (set of attributes) only one

move (c1a~sifintion) is correct. If an event is consistent and unique, it is recorded. It a move is incon

sistent, the original classification can be allowed to stand or the event can be reclassified.

!v1EL obtains experience in the form of even ts from three different sources. It can observe a game

and record the moves of either or both players. It can also be given specific examples by an external

expert or teacher. Fin ally, !v1EL can provide its own examples by playing games and recording those

moves which lead to some desired outcome. The next three sections deal with these sources individually.

3.1.2. Events trom Observation

~lEL allows six player types to compete in a game. Observing the moves of either or both is one

way f..ffiL gains experience. The player should be consistent in his moves (and will be informed if he is

not). The more skillful the player, the better experience !v1EL will gain.

A human is one player type which can be observed. Moves are input via a mouse. If a move is

inconsistent with a previous move, the human player determines which is correct. A skilled human

player provides the best examples from observation. The drawback is that playing the number of games

required to provide a good set of examples requires a good deal of patience.

Another observable player is the machine player which uses a programmed (not learned) set of

rules to generate moves. This player is not highly skilled, but it removes the tedium of human move

generation and is suitable for obtaining examples of reasonable (if not brilliant) play. The machine

player does fairly well in the placing phase and so is most useful for providing examples of play from

type. Also, MEL ha.s been progra.mmed with the definitions of lega.l moves.

6

that phase. It plays quite poorly in the other phases and the examples gained are oC little value. The

reason Cor the poor play is poor rules. Good rules Cor the moving and flying phases are particularly

difficult to write, and this difficulty was the main inspiration for a program which could acquire

knowledge automatically. When the machine player generates an inconsistent move, the original move is

sustained.

The teamer player type can also be observed. This player is identical to the human player, except

that a list oC moves generated Crom the learned rules is provided. The main value of this player type is

Cor fine-tulllDg learned rules. For large sets oC learned rules however, generating the list oC moves is

quite time-consuming.

The/earned player is the final player which can provide examples by observation. This player gen

erates moves using the learned rules. The only reason to observe this player is to create a more robust

set oC examples. This should only be done when the learned rules are of good quality.

A player which cannot be observed is the random player. The random player generates legal

moves, but since its play is inconsistent, observing it would be of little value. It does make a useful

opponent Cor an observed player, generating moves which may be unsound, but which would never be

encountered in a game between skilled players. The experimenter is the other player type which cannot

be observed. This is because moves made by this player are not added to the database of events unless

they lead to a favorable board position. Moves are generated from the learned rules. The experimenter

player can be thought of as a learned player which is observed only when it is doing well. It will be dis

cussed further in Section 3.1.4.

3.1.3. Events from a Teacher

It was noted in Section 3 that direct implantation of knowledge is oCten useful. MEL allows exam

ples oC play to be presented directly. Events obtained in this manner are allowed to have a junction

color attribute of "don't care" in addition to the usual white, black or empty. Suppose tbe teacher

wishes to provide an example of white completing a mill to help ~fEL learn that concept. An example

7

which might be provided is an empty junction 1 with junctions 2 and 3 white. The colors of the other

junctions are of no interest and can be valu.ed as don't-care. Examples provided by a teacher help

reduce the number of events which must be recorded when a player is observed during a game. If the

player makes a move which matches, attribute for attribute, a teacher provided example, MEL can

disregard the observed example since there is no new knowledge to be gained. (The don't-care matches

any value.)

3.1.4. Events from Experimentation

The experimenter player type allows MEL to provide its own examples. Instead of recording

moves as events as soon as they are observed, they are recorded as temporary events called Cevent8.

When a favorable position is reached, a count associated with the t_event is incremented. When this

count reaches a certain threshhold, the t_event is rerecorded as a regular event. If the threshhold were

two, for example, no event would be recorded until it had twice led to a favorable position. (The idea of

thresh holds is due to R.S. MichalskL) A high value for the threshhold provides high quality events at a

slow pace. A low value provides many events of lesser quality. Definitions of favorable positions must

be provided, but they can be as simple as a won game.

3.2. Codifying Experience

Having assimilated a number of events, tvfEL has a rather useless assemblage of knowledge. en less

a situation is encountered for which an event has already been recorded (an improbable event), tvfEL's

experience is of no value. What is needed is some method of transferring the specific knowledge of the

events to a more general form which can be applied to new situations. Specifically, once we have a set of

events in which a corner move (t1) is appropriate, we want a description ot this set which includes all of

the t1 events and no events trom the other three classes.

tvfEL transtorms its knowledge using instance-to-class generalization. After a number of events

have been collected, MEL uses program GEM to generate generalized rules for each event type. These

8

tl-event5 # c 51 52 53 54 55 56 s7 58 1 w e w w b e b b b 2 w e w w e w e e w 3 w e w w e b e e b

t2-event5 # c 51 52 53 54 s5 56 s7 58 1 w w e w e b e b e 2 w w e e b w e e w 3 w w e w w b w e e 4 w e e e e w e e w

t3-events # c 51 52 53 54 55 56 57 58 1 w e e e e w w e e

t4-events # c 51 52 53 s4 55 s6 s7 s8 1 w w b b e w e w e 2 w w b w e w e e e

Figure 2. Example GEM input.

rules describe the events of each move type and distinguish them from the events of the other three

move types.

An example will make this clear. Figure 2 shows GEM input in the form of a relational table. The

table represents a collection of events of type placing, grouped by move type. Each row is an event and

each column contains values for a particular attribute. The attributes in the column labeled c are the

colors of the moving player (white or black). The other attributes are the colors (white. black or empty)

of the board junctions. The events in this example all show white placing to complete a mill (the bold

faced attributes). Thus, event #1 of the tl even.ts shows white moving to junction 1 when junctions 2

and 3 are white (which completes a white mill at 1-2-3). Only the first eight board junctions are shown

although in actual practice all 24 would be present.

Figure 3 shows some possible GEM output ror the example in Figure 2. A set or disjuntive com

pleus form the descriptions ror each class. The complexes are formed from conjunctive selectors (the

bracketed expressions). Thus, the description of a t2 move in English is: (1) Junction 2 is empty and

junctions 1 and 3 are white, or (2) junction 2 is empty and junctions 5 and 8 are white.

The classification rules produced by GEM are generalizations because they describe more situations

than did the original events. The salient features of t1 events, according to GEM, are that junction 1 is

empty and junctions 2 and 3 are white. In addition, this description does not describe events or any

other move type. If a new event is encountered which fits this generai description, the general rule can

be used to decide that a move to junction 1 is appropriate. If a new event is encountered which (accord

ing to the rules) belongs to more than one class, the rules are too general. If the new event is described

tl-outhypo #: cpx 1 [sl=eJ [s2=w] [s3=w]

t2-outhypo #: cpx 1 [sl=w] [s2=e] [s3=w] 2 [s2=e] [s5=w] [s8=w]

t3-outhypo #: cpx 1 !s4=e] [s5=w]ls6=w]

t4-outbypo #: cpx 1 [s2=wJ [s5=e] [s8=w] 2 [s2=wl [s5=eJ [s8=wl

Figure 3. Example GEM output.

10

by none of the rules, the rules are too specific. In either case, the new event should be added to the

database of events and GEM rerun.

~1EL does not use GEM output directly. The rules are translated into a single Prolog statement.

The Prolog statement is a representation of an optimal binary tree Cor evaluating which complexes are

satisfied (and thus which move to make). The nodes of the tree are the selectors and the branches

represent whether or not the node was satisfied. The selector which appears at the top oC any subtree is

the selector which appears in the most complexes represented by that subtree. Therefore, the root node

of the tree (which represents all complexes) is the selector which appears most often. In the example, the

root node is the selector [s2=w] since it appears in three complexes. The left subtree under the root

represents the rules in which [s2=w] appears and the right subtree those in which it does not appear.

With this arrangement, the number or times each complex must be evaluated is minimized. The learned

rules apply to a board in the normal orientation. The board may need to be reeoriented to check for all

rules which can be satisfied.

4. GEM

GEM (or more precisely GEMl.O [Reinke 84]) is the latest in a series oC induction programs

developed by the Intelligent Systems Group at the University of Illinois at Urbana-Champaign. The AQ

algorithm is at the heart of the various versions or GEM. Briefly, the AQ algorithm produces descrip

tions of classes oC events. Each event is a vector of attribute values. The attribute values are discrete

and belong to finite domains. For an easy to understand explanation oC how the algorithm works, con

sult [Reinke 841. For a detailed theoretical discussion see [Michalski 75].

GEM provides input and output Cor AQ which is geared ror use by a knowledge engineer. As men

tioned earlier, GEM input is in the Corm oC relational tables and output is in the Corm oC variable-valued

logic expressions. Additional input information regarding attributes and their domains can be provided.

GEM allows a cost to be associated with each attribute. This can be viewed as the cost for evaluating

the attribute or as a measure or importance oC an attribute, a lower value implying less cost or more

11

importance. LEF's (Lexicographical Evaluation Functions) tell GEM what type of complexes to form.

An LEF can specify short complexes, long complexes, or complexes utilizing the most important attri

butes. ~1EL takes advantage of this and places greater importance on the junctions which are closest to

the normalized junctions 1, 2, 4 and 5. This provides a focus of attention in the rules neat the move

junctions. The attributes which are cheapest to evaluate are the color of the moving player and the

"from" junction of a moving or flying move. ~L uses LEF's which dictate short, cheap complexes con

sisting of junctions near where a move is to be made.

12

APPENDIX A

USER'S GUIDE

MEL is written in UNSW Prolog [Sammut 83j and uses PROLOGRAPHICS [Channic 83] (or

graphics on a Sun Microsystems display. A stripped-down non-interactive version with no display was

used (or lengthy batch runs.

MEL is loaded (rom a window on the sun by entering:

sprolos -s300000 sunmlll [event-file] [rule-file]

The request for such a huge amount of stack space (-s300000) is unnecessary i( no GEM output will be

translated to Prolog. A request of 50000 units will suffice if no translation is to be done. Due to the

program's size, MEL takes about a minute to load. Optional files for events and learned rules may fol

low the program file and will extend the load time.

~10st input is accomplished through a three-button mouse. Prompts are provided as to which but

ton to push. Multiple choice questions with more than three alternatives are selected from menus by

positioning the mouse over the desired choice and pressing any mouse button. Moves are selected in a

similar manner by positioning the pointer at or near the desired junction or piece. UNSW Prolog com

mands must be entered (rom the keyboard. Remember that the sunwindow package requires the mouse

pointer to be in the text window for keyboard input.

After MEL has finished loading, you will be prompted for gprolog commands. The command to

begin playing is:

play-same!

You will then be asked whether you want instructions. An affirmative answer scrolls a brief description

of the game in the text window. The next thing to decide is wh at types of players should compete from

13

a menu of the six possible choices. Refer to the descriptions of the player types in Section 3.1.2 to help

you decide. you decide. Finally, you will be prompted as to whether either player's moves should be

recorded. Remember that the moves of player types random and experimenter cannot be recorded. Play

begins after this information has been entered.

If neither player requires human input, the game will progress on its own until one player wins or

50 moves have been made by each side. Players requiring human input (types human and learner) indi

cate their moves by placing the mouse pointer over the desired junction or piece and pressing a mouse

button. If a player has a single choice of move, the move will be selected automatically. On any player's

move, the single-item menu labeled "RETRACT MOVE" may be selected. This action returns the game

to the state it was in before that player's last move. Events (regular and temporary) which were

recorded for retracted moves are also erased. If the player was selecting the "to" portion of a moving or

flying move, RETRACT MOVE returns the game to the start of the current move. After the game has

concluded, you will again be prompted for gprolog commands.

In addition to the regular UNSW Prolog commands for editing files, etc., MEL has several useCul

commands which can be entered from the keyboard. They are listed below with a brief description.

play...game! begin a game.

stops a game (or any other action). This is not usually dangerous unless it is done while something is being drawn on the screen.

learntng(yes)! gives you the menu Cor learning. See below for details.

save_events(file)! saves all regular events in file.

clear_events! removes all regular events.

saves all temporary events in file.

removes all temporary events.

saveJules(file)! saves learned rules in file.

14

example! allows the input of examples. See below for details.

To provide specific examples of play, enter (in the text window):

example!

You will be asked to select the game phase for which your example applies from a menu. After you have

chosen the phase, a board will be drawn with a question mark at each junction. The question marks

indicate that the color of that junction is irrelevant to the example and may have any value. Fill in the

junctions that are relevant by positioning the mouse pointer at the junction and pressing button I for a

white piece, 2 for an empty junction, and 3 for a black piece. The position which you create must have

at least one legal move for the chosen phase. (For placing, moving or fiying, this means you must

specify at least one empty junction.) You may change the colors of junctions as many times as you wish.

When you are finished, select the single item menu labeled "END". If there is more than one legal move

for the example you set up, you will be asked to input that move. This is done in the same way as a

move in a game. You will then be asked whether another example is to be entered or to end example

input.

After MEL has acquired some examples of play either by observing some games or from specific

examples you have provided, learned rules can be generated by entering:

learning!

You will be asked whether you want to perform batch or incremen tal learning. Batch learning generates

learned rules from scratch. Incremental learning uses existing GEM output to aid in generating the new

rules. If no GEM output exists, the two types of learning will work identically. If GEM output does

exist, incremental learning will be much faster, especially if there are only a few new events since the last

rule generation. If in doubt, use incremental learning. GEM output is stored in files of the form

• ...,gemout where the "." is p, r, mI, m2, fl or f2. Everything else is automatic. MEL will keep you

informed of what it is doing (running GEM, translating). To save learned rules enter:

15

s&veJules(filename)!

where filename is the name of the file to which the learned rules are to be written. The Prolog version

of a learned rule is highly unreadable. Use the GEM output directly to understand what the rules mean.

The program consists of 21 modules, each consisting of about 100 lines of code. The names of the

modules and a brief description of each are listed below.

Module experiment contains the routines for conducting experiments in play. It contains routines

for manipulating temporary events (t_events) and recognizing good positions. The threshhold value for

the recording of temporary events is defined here and may be changed. Three definitions of good posi

tions are given, but the user may wish to add others.

Module gemsetup contains the routines for setting up and running GEM. It processes events from

the database into the relational table format which is the input to GEM. It also contains the routines

for writing learned rules to a file.

Module integrity is responsible for maintaining the integrity of the database of events. EH'nts

must belong to a single class and must be unique.

Module miscl contains various low-level routines concerned with list manipt»ation. Module misc2

contains other low-level routines which are used by other modules.

Module orient contains definitions of the sixteen possible board orientations with routines (or

changing from one orientation to another.

Module recordl contains routines for maintaining the database of events including writing events

to a file, erasing ev~nts and recording events. The higher level functions of deciding whether an event

should or should not be recorded are in modules integrity and sunrecord.

Module rules contains the hard-coded rules used by the machine player. Rules for moving at ran

dom and using human input are also expressed by rules in this module.

16

Module stl'uctl contains the definitions of basic board structures (junction, row, mill) used by

other modules. Module stl'uet2 contains higher level structures used only by the machine player.

:t-.10dule sundl'aw contains the routines for drawing and undrawing screen objects via the PROLO·

GRAPHICS package. It also contains routines for converting screen coordinates to screen object

numbers.

Module sunexample contains routines for accepting and processing example moves.

Module sungame contains the routines for conducting a game. These include routines for initializ·

ing, moving and determining when a game is over.

Module sunhuman is responsible for accepting and checking human input via the mouse such as

moving and selection from menus. It also contains routines for adjusting the game when a move has

been retracted.

Module sunleal'n contains the routines for controlling batch and incremental learning. The actual

routines for running GEM and interpreting GEM output are in modules gemsetup and translate respec

tively.

Module sunmillioads all of the other modules and prints the introductory message.

Module sunmove contains the routines for controlling move generation. It also handles removing

pieces and tracking moves for possible retraction.

Modules sunobjl and sunobj2 contain definitions of screen objects. In general, objects defined in

sunobjl are simpler than those defined in sunobj2. Objects in sunobjl are used in defining objects in

sunobj2.

Module sunl'ecol'd controls the recording of moves. It also informs the user of inconsistent moves

when they are made. The actual checking, erasing and recording of the moves is done in modules

record 1 and integrity.

17

Module translate processes GEM output in the form of variable-valued logic expressions into a

Prolog statement which can be used by MEL. It contains routines for parsing GEM output, translating

selectors and arranging the translated selectors in an optimal tree structure.

Module valid contains the definitions or valid moves.

Prolog is nearly self-documenting and the interested reader is invited to examine the program list

ing for specifics on MEL's operation. Every attempt has been made to make variable and predicate

names mnemonic.

MEL uses several external files. Binary files eX8creen and jscreen contain definitions of the exam

ple screen and the labeled board junctions, respectively. They can be generated by MEL dynamically.

but this is slow. Events and their corresponding GEM rules are written to files of the form '*_et'enls

and '*_gemou/, respectively, where '* is p, r, ml, m2, /1, or /2. Prolog learned rules are built in file

lrule...file. When incremental learning is being performed, output from the previous invocation of GE~f

is used as input to the current invocation and is stored in inhypo. File inhypoed contains the stream edi

tor (sed) commands for changing GEM output to GEM input hypotheses. Files gemheadl and gemhead2

contain static GEM input (everything but inhypo tables and event tables), for one and two part moves

respectively.

18

REFERENCES

Boulanger, A.B., "The Expert System Plant/CD: A Case Study in Applying the General Purpose Inference System ADVISE to Predicting Black Cutworm Damage in Corn," Report No. UIUCDCS-R-83-1134, Department of Compute Science, University or Illinois, 1983.

Channic, T.,"PROLOGRAPHICS: A Graphics Interface for Prolog", unpublished, 1984.

Hoff, W., Michalski, RS. and Stepp, RE., "INDUCE-2: A Program for Learning Structural Descriptions from Examples," Report No. UIUCDCS-F-83-904, Department of Computer Science, Vniversity of Illinois, 1983.

Michalski, RS., "Discovering Classification Rules Vsing Variable-Valued Logic System VL ," Third1International Joint Conference on Artificial Intelligence, pp. 162-172, 1973.

Michalski, RS., "Synthesis of Optimal and Quasi-Optimal Variable-Valued Logic Formulas", Proceedings of the IfJ75 International Symposium on Multiple Valued Logic, pp. 76-87, 1975.

Michalski, RS., "A Theory and Methodology of Inductive Learning," Machine Learning, Michalski, RS., Carbonell, J. and Mitchell, T. (Eds.), pp. 83-134, Tioga, Palo Alto, CA, 1983.

~fichalski, RS. and Chilausky, RL., "Learning by Being Told and Learning from Examples: an Experimental Comparison of Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis," International Journal of Policy Analysis and Information Systems, Vol. 4, No.2, pp. 125-160, 1980.

'\1ichie, D., "Experiments on the Mechanization of Game Learning," Computer Journal, VoL 25, No.1, pp. 105-112, 1982.

Morehead, A.H. and Mott-Smith, G. (Eds.),Hoyle Up-to-date, Grosset & Dunlap, New York, NY, 1976.

Quinlan, J.R, "Discovering Rules from Large Collections of Examples: A Case Study," Expert Systems in the Micro-Electronic Age, Michie, D. (Ed.), pp. 168-201, Edinburgh Vniversity Press, Edinburgh, 19i9.

Quinlan, J.R, "Learning Efficient Classification Procedures and their Application to Chess End Games," Afachine Learning, Michalski, RS., Carbonell, J. and Mitchell, T. (Eds.), pp. 463-481, Tioga, Palo Alto, CA, 1983.

Reinke, RE., "A Structured Black-to-Win Decision Tree for the Chess Endgame KP vs. KR (pa7)," Internal Report, Intelligent Systems Group, Department of Computer Science, University of Illinois. 1982.

Reinke, RE., "Knowledge Acquisition and Refinement Tools for the ADVISE Meta-Expert System," M.S. Thesis, Department of Computer Science, University of Illinois, 1984.

Scarne, J., Scarne's Encyclopedia of Games, pp. 532-533, Harper & Row, New York, NY, 1973.

Shapiro, A. and Niblett, T., "Automatic Induction of Classification Rules for a Chess Endgame," Advances in Computer Chess, Clarke, M.R. (Ed.), Edinburgh University Press, Edinburgh, 1982.

title • tnt 1 'p"

tl.QQthJPo t Cpl 1 (~=b][s2:b] (54=e] [sS:eJ (sl=.,w] 2 [sh.] [she] [5I:e] (sh.1 {stl:w,.] 3 [51:.] [s2=w,b] {54=w,lI] [sS:w] ( [51=e] [53=b] [s6=e] [51=.1 [s2a:w,.J S [c:wJ(53=1] [sS.b] [56:1] {sl:.,wJ £ [s2=w,.] [5S=b] [lh.J 1 [51:1] {5(:w,b] [s5:wl [sl=.J

[51=.1 [s6=el [51=w,1I1 [58=eJ

, [51=1: [I(:W] [56=1,11] [sl=l]

10 [51=e] [54=w] [sS:w] [sl=.,b]

t2·oatbypo , cpa

1 [sl=b] [5S:W] [s6=1I]

1 (sl.e] [s4=',II) (sS •• ,b) [s6=w] [ill:.,bl

3 [c.bl(sl:bJ [s5=w] [s6=e,b]

4 [54=w,b] [sS:b) [s6=w,b) [sl=w,b]

5 (12:.1 [5S:b] (56=e,w] [51:b]

[c=b][sl=bJ [s2=ti [5S=w,lI] {s6=e]

1 (52=.] [s3=e,w] (54=e] [sS.e] {s7=e] {s11:.] [slO=w,b]

I [sl=.,.J [sl=.J Ishw1 [s6:.,b] [sl=w]

, [sl=bJ [sl=e1 [sS:w] [s6=.,111

10 {sl:w,b] [sl=.] [53=w,lI] {shbl [5he,wl

11 [sl=.,IIJ [s4=1] [sS:w] [s6=.,b] {s7.e] [s9... b1 [s20=e]

11 (c="'][51=e,IIJ [54=e] {15=W] {she,bl [s7.... ] £sl1=e] [db.1

13 [51=.,b] [sh.] [54:11] [5"'W; {sbw,lI]

14 {51=.J [53=w] [s(=.,.J [15=11] [s4=.,w] [s7=e] [s11=e]

IS (sS"b] (shwl

U-oathJpo • cpl

1 [54:.] '[sS=wJ [Ihw]

154=e] [sS=b] (shill

3 [52=eJ (s4=e] {IS=.,W] [s6=w] {s?=w,~] [121=w}

4 (she,v] {sh.] {5S::.l [s?:w,b] {shbl

S [il".l [is=.] [s6=.J [sl=v,~l (511=w,bl

[i2=w] [13=.,w] [54:.) [55=.1 [s1=.] {s8;e] (59=w,0] 1 [c=bH51=e,w] [54=.) (sS=vJ [s7=ll [she,bl [s9:wJ [s28:e] 8 t52:v] [54=.] [5S... 1 [16"w,01 [sl=e] , [11=e.w) [53=.,.) [14=.] [sS=w] [57=.] (sh.,b] [5hw,b] [slG=v,b]

1D [c=wJ[51:w,bl [5%:W,O] [5S=V] [s6=.,w] [19".,v) (s11=.,b1

11 (c=v]{s4=.] [5S=.,b1 [56=b] [sl=e] [520=.]

12 [sl=w] [5S=W)

t4-olthfPO I cpa 1 l51:.1 [54: •• bJ [s5=11 [S'=IJ (57:IJ [sll:.1 (slD:.] 1 (s4=b] [55:11 (5.=&1 3 [sl=v] [55=1] (s8=w,bl 4 [51:w,bj tsl:l] [5S=I] (5'=IJ [58=e1 (520=.] S [5Z:e1 :54:e,bJ (s5=1] [s'=e1 [57:.l [51=11 {59:w.bJ (510=IJ

{s4:wl [5S:I! [s6=.] [52:w.b1 [55=.] [sl=bl

This ran DSld laillisftOnds of CPU tiat): 5yst.a tiat: 700

180450

i

BIBLIOGRAPHIC DATA 1. Report No. 3. Recipient's Accession No. SHEET 1 I~UIUCDCS-F-85-931

14. T ule ana ;,uOtltle S. Report Date Janqruy 1985

in Playing the Game of MILL

MEL - A Learning Program that Improves by Experience

6.

7. Author(s) 8. Performing Organization Rept.No.

P...a.ul ..Ho.f.£man 9. Performing Organization Name and Address 10. Project/Task/Work Unit No.

Department of Computer Science 11. Contract/Grant No.University of Illinois NSF DCR 84-06801

1304 W. Springfield Avenue ONR N00014-82-K-0186Urbana, IL 61801 12. Sponsoring Organization Name and Address 13. Type of Report & Period

Covered Office of Naval Research National Science Foundation Arlington, VA Washington, DC

15. Supplementary Notes

16. Abstracts This paper describes a program able to learn how to play the board game MILL. The

program, called MEL, acquires its knowledge in much the same way a human player would _ from a teacher, by observing games or by playing games itself. This knowledge is recorded in the form of examples of play. When a number of examples have been assimilated, MEL invokes program GEM to induce rules of play from the examples. The induced (or learned) rules are generalizations of the examples. Finally, MEL translates and reorganizes the learned rules so that they can be used by the program to play the game.

17. Key ',l'ords and Document Analysis. 170. Descriptors

Machine Learning

Induction

Game Playing

Learning Rules from Examples

Self-Improvement Programs

17b. Identifiers/Open-Ended Terms

17

Paul Hoffman Depart:Irent of Canputer Science[Scarne 731 discusses some basic strategy. 3 moving,...

Documents

Transcript of Paul Hoffman Depart:Irent of Canputer Science[Scarne 731 discusses some basic strategy. 3 moving,...