Paul Hoffman Depart:Irent of Canputer Science[Scarne 731 discusses some basic strategy. 3 moving,...

24
5 File No. UIUCDCS-F-85-93l MEL A learning Program that Improves by Experience in Playing the Garre of MILL Paul Hoffman Depart:Irent of Canputer Science University of Illinois at Urbana-champai.gn January 1985 ISG 85-2

Transcript of Paul Hoffman Depart:Irent of Canputer Science[Scarne 731 discusses some basic strategy. 3 moving,...

  • 5 File No. UIUCDCS-F-85-93l

    MEL A learning Program that Improves

    by Experience in Playing the Garre of MILL

    Paul Hoffman

    Depart:Irent of Canputer Science

    University of Illinois

    at Urbana-champai.gn

    January 1985

    ISG 85-2

    http:Urbana-champai.gn

  • ACKN~

    This project was made possible through the help and support of

    several people. Primary thanks go to my advisor, Dr. R. S. Michalski,

    for allowing m= the freedan to develop my own project. The project

    is an application of his basic research in machine learning.

    Dr. Claude Samrut introduced m= to the joys of PRO:U::X:; and encouraged

    this work in its infancy as a class project. I am indebted to

    several members of the Intelligent Systems Group at the University

    of Illinois. Chief arrong these are: Tan Channic for help with

    his PROI.J:::GRAPHICS package, Tony Nowicki for system support and

    Bruce Katz for valuable discussions about GEM and game-playing in

    general.

    I am grateful for the errotional and financial support of my

    parents. I couldn't have done it without them.

    This research was supported, in part, by the National Science

    Foundation under grant OCR 84-06801 and the Office of Naval Research

    under grant NOOO l4-82-K-0186.

  • iv

    TABLE OF CONTENTS

    1. INTRODUCTION .................................................................................................................................. 1

    2. THE GAME OF MILL ..................................................................................................................... ...... 2

    3. THE LEARNING PROCESS ................................................................................................................ 4

    3.1 Learning by Example ............................................................................................................ 4

    3.1.1 Recording Events .............................................................................................. 4

    3.1.2 Events from Observation ................................................................................. I)

    3.1.3 Events rrom a Teacher ..................................................................................... 6

    3.1.4 Events from Experimentation .......................................................................... 7

    3.2 Codifying Experience ............................................................................................................. 7

    4. GEM ........................................................................................................................................................ 10

    APPEI\'D[x A: USER'S GUIDE ........................................................................................................ 12

    REFERENCES ..................................................................................................................................... 18

  • 1

    1. INTRODUCTION

    Most programs for playing non-trivial competitive games use some variant of the minimax algo

    rithm, first suggested in the early 1950's by Claude Shannon. Move selection is accomplished by gen

    erating a tree or moves, replies to those moves, replies to the replies, and so on. The best move is

    assumed to be the one which leads to the best position at some arbitrary depth in the tree. For such

    programs, the quality of play depends on how much of the tree can be generated and evaluated, given

    certain time and/or space restrictions. The epitome of this approach is Belle !Condon and Thompson

    82], the current world computer chess champion, which examines just under three million positions in

    the average three minutes per move it is allotted in tournament play.

    The few human players who can defeat Belle typically examine no more than 100 positions per

    move. They rely instead on a vast knowledge or the game. Endowing a program with such knowledge is

    a difficult task. The basic problem is one or knowledge acquisition. Human experts (in various fields)

    often have difficulty expressing exactly how and why they arrived at a particular decision. [Michie 8::!j

    illustrates the problem with the story or a cheese factory famous ror its camemberts .

    . . . every hundredth cheese was sampled to ensure that the production process was still on the narrow path separating the marginally unripe from the marginally over-ripe. Success rested on the uncanny powers developed by one very old man, whose procedure was to thrust his index finger into the cheese, close his eyes and utter an opinion. H only because of the expert's age and frailty, automation seemed to be required, and an ambitious R&D project was launched. After much escalation of cost and elaboration of method, no progress had been registered. Substantia! inducements were offered to the sage for a. precise account of how he did the trick. He could offer little, beyond the advice: "It's got to feel right~" In the end it turned out that fee) had nothing to do with it. After breaking the crust with his finger, the expert was interpreting subliminal signals from his sense of smell.

    This paper describes an attempt at constructing a knowledge-based player for the board game f..1i11.

    The program, MELl, acquires its knowledge in much the same way a human player would - from a

    teacher, by observing games or by playing games itself. This knowledge is recorded in the form or exam-

    pies of play. When a number or examples have been assimilated, MEL invokes program GEM to induce

    rules of play from the examples. The induced (or learned) rules are generalizations of the examples.

    1 The na.me MEL is composed of the first, second and third letters, re5pectively, of Machine Learning (of) Mill.

  • Finally, l'v1EL translates and reorganiz.es the learned rules so that they can be used by the program to

    play the game.

    2. THE GAME OF MILL

    Mill2 is an old game, having been played by the ancient Greeks. It derives its name from the

    repetitive moving of a player's pieces (stones) to grind down an opponent. In England, the game is

    known as Nine-Men Morris after its resemblance to "morris" (Moorish) dances.

    Mill is played on the board shown in Figure 1. The players, White and Black, are each equipped

    with nine playing pieces of their color. The play can be partitioned into three distinct phases: placing,

    1-------2-------3

    I

    4----5----6

    I

    7-8-9

    I I

    10-11-12 13-14-15

    I I

    16-17-18

    I

    19----20----21

    I

    22-------23--------24

    Figure 1. The Mill board with labeled junctions.

    2 [Morehead and Mott-Smith 761 give a brief hi3tOry. [Scarne 731 discusses some basic strategy.

    http:reorganiz.es

  • 3

    moving, and flying. The game begins with an empty board in the placing phase. The players (White

    first) alternately place their pieces at anyone of the 24 junctions where two lines intersect, provided no

    piece has yet been placed there. When all pieces have been placed the game enters the moving phase.

    Players now move their pieces along the lines to adjacent unoccupied junctions. The object of this

    maneuvering is to align three pieces of the same color on the same line. Such an arrangement is called a

    mill and the player who forms one is entitled to remove one opposing piece, provided that that piece is

    not itselC part of a mill. Once a mill has been Cormed and a piece removed, it may on a subsequent turn

    be opened by moving one piece off the Jine. It the miU is then reCormed (closed), another piece may be

    removed. Much oC the strategy in the moving phase involves opening and closing mills at the right

    times. When a player has been reduced to three pieces, he enters the Hying pha.se. He is no longer res

    tricted to moving between adjacent junctions, but may move to any empty junction. A player loses

    when he has been reduced to two pieces, or when he cannot move. Games between good players usually

    end in draws.

    Variations of the game exist. Most involve the flying phase, either eliminating it entirely or vary

    ing the number of pieces with which a player may Oy. In some games a piece may be removed from a

    mill if there are no other choices. Go-bang is a related game in the family of Go games.

    For the purposes of the program, removing was added as a fourth phase. A player enters this

    phase when he has formed a miH and returns to his previous phase upon capturing an opposing piece.

    Due to the symmetry of the board, there are only four move type8. They correspond to moves at

    junctions 1, 2, 4, and 5 and are called t1, t2, l3, and l4 event8, respectively. (The l stands for lype.)

    The program makes extensive use of this symmetry, using it to reduce the amount of data and complex

    ity of the rules. For example, opening moves to junctions 1, 3, 7, 9, 16, 18, 22 or 24 (all tl moves) are

    considered identical and are treated internally as if the move had been made to junction 1. All of this is

    invisible to the user however, who sees board pieces at the junctions they were placed.

  • 4

    3. THE LEARNING PROCESS

    When certain principles of play are well known and straightforward, it is useful to sidestep learning

    and provide this knowledge in some direct manner. .MEL began as a class project in just this way, using

    static, programmed rules for play. The only way to increase the level of play was to write more (or

    better) rules. The advantage of rote learninl is the speed with which the knowledge is gained. The

    disadvantages are the difficulties of preparing such knowledge initially and modifying it later. 4

    3.1. Learning by Example

    The method of learning which .MEL uses is learning by example. The examples are moves to or

    from a junction and are called events. Moves made during the placing or removing phases are each

    represented by a single event. Moves made during the moving or flying phases are represented by two

    events, one for the "from" portion and one for the "to" portion. Each event is a set of attributes. Attri

    butes are such facts as the color of the moving player and the colors of each of the 24 junctions. Events

    representing the "from" portion of a moving or flying move have an additional attribute indicating the

    junction from which the piece was moved.

    ?viEL organizes events according to move type and event type. Event types correspond to the

    game phases except that moving and flying are subdivided into their "from" and "to" portions. The

    event types are denoted p, r, ml, m2, 11 and 12 for placing, removing, moving from, moving to, Hying

    from and flying to respectively. Events of each type are further divided according to move type (tl, t2,

    tB and t4).

    3.1.1. Recording Events

    Recording an event is a complicated process. The board is first reeoriented (normalized) so that

    the junction of the move (from or to) becomes one of the junctions 1, 2, 4 or 5. A move to junction 3,

    Jrhe classIfication of learning into: rate learning, ltarning by being told, learnang by analogy, learning from uamples and learning by obsert/alion and discovery is due to [Carbonell, Michalski and Mitchell 83].

    ~ote learning still plays a small part in MEL. The programmed rules are still present in the guise of the machine player

  • 5

    for example, causes the board to be rotated 90 degrees counterclockwise so that junction 3 is in the posi

    tion of junction 1. The normalized event must then be checked to make sure it is not a duplicate of one

    already recorded. Events which differ merely in orientation are considered duplicates. Then the event

    must be checked for consistency. The database of events is consistent if each event belongs to only one

    of the four classes (tl, 12, 13 or 14). Inconsistencies arise when the same event is classified as say, both a

    t1 and a 12 event. The implicit assumption is that for any given position (set of attributes) only one

    move (c1a~sifintion) is correct. If an event is consistent and unique, it is recorded. It a move is incon

    sistent, the original classification can be allowed to stand or the event can be reclassified.

    !v1EL obtains experience in the form of even ts from three different sources. It can observe a game

    and record the moves of either or both players. It can also be given specific examples by an external

    expert or teacher. Fin ally, !v1EL can provide its own examples by playing games and recording those

    moves which lead to some desired outcome. The next three sections deal with these sources individually.

    3.1.2. Events trom Observation

    ~lEL allows six player types to compete in a game. Observing the moves of either or both is one

    way f..ffiL gains experience. The player should be consistent in his moves (and will be informed if he is

    not). The more skillful the player, the better experience !v1EL will gain.

    A human is one player type which can be observed. Moves are input via a mouse. If a move is

    inconsistent with a previous move, the human player determines which is correct. A skilled human

    player provides the best examples from observation. The drawback is that playing the number of games

    required to provide a good set of examples requires a good deal of patience.

    Another observable player is the machine player which uses a programmed (not learned) set of

    rules to generate moves. This player is not highly skilled, but it removes the tedium of human move

    generation and is suitable for obtaining examples of reasonable (if not brilliant) play. The machine

    player does fairly well in the placing phase and so is most useful for providing examples of play from

    type. Also, MEL ha.s been progra.mmed with the definitions of lega.l moves.

  • 6

    that phase. It plays quite poorly in the other phases and the examples gained are oC little value. The

    reason Cor the poor play is poor rules. Good rules Cor the moving and flying phases are particularly

    difficult to write, and this difficulty was the main inspiration for a program which could acquire

    knowledge automatically. When the machine player generates an inconsistent move, the original move is

    sustained.

    The teamer player type can also be observed. This player is identical to the human player, except

    that a list oC moves generated Crom the learned rules is provided. The main value of this player type is

    Cor fine-tulllDg learned rules. For large sets oC learned rules however, generating the list oC moves is

    quite time-consuming.

    The/earned player is the final player which can provide examples by observation. This player gen

    erates moves using the learned rules. The only reason to observe this player is to create a more robust

    set oC examples. This should only be done when the learned rules are of good quality.

    A player which cannot be observed is the random player. The random player generates legal

    moves, but since its play is inconsistent, observing it would be of little value. It does make a useful

    opponent Cor an observed player, generating moves which may be unsound, but which would never be

    encountered in a game between skilled players. The experimenter is the other player type which cannot

    be observed. This is because moves made by this player are not added to the database of events unless

    they lead to a favorable board position. Moves are generated from the learned rules. The experimenter

    player can be thought of as a learned player which is observed only when it is doing well. It will be dis

    cussed further in Section 3.1.4.

    3.1.3. Events from a Teacher

    It was noted in Section 3 that direct implantation of knowledge is oCten useful. MEL allows exam

    ples oC play to be presented directly. Events obtained in this manner are allowed to have a junction

    color attribute of "don't care" in addition to the usual white, black or empty. Suppose tbe teacher

    wishes to provide an example of white completing a mill to help ~fEL learn that concept. An example

  • 7

    which might be provided is an empty junction 1 with junctions 2 and 3 white. The colors of the other

    junctions are of no interest and can be valu.ed as don't-care. Examples provided by a teacher help

    reduce the number of events which must be recorded when a player is observed during a game. If the

    player makes a move which matches, attribute for attribute, a teacher provided example, MEL can

    disregard the observed example since there is no new knowledge to be gained. (The don't-care matches

    any value.)

    3.1.4. Events from Experimentation

    The experimenter player type allows MEL to provide its own examples. Instead of recording

    moves as events as soon as they are observed, they are recorded as temporary events called Cevent8.

    When a favorable position is reached, a count associated with the t_event is incremented. When this

    count reaches a certain threshhold, the t_event is rerecorded as a regular event. If the threshhold were

    two, for example, no event would be recorded until it had twice led to a favorable position. (The idea of

    thresh holds is due to R.S. MichalskL) A high value for the threshhold provides high quality events at a

    slow pace. A low value provides many events of lesser quality. Definitions of favorable positions must

    be provided, but they can be as simple as a won game.

    3.2. Codifying Experience

    Having assimilated a number of events, tvfEL has a rather useless assemblage of knowledge. en less

    a situation is encountered for which an event has already been recorded (an improbable event), tvfEL's

    experience is of no value. What is needed is some method of transferring the specific knowledge of the

    events to a more general form which can be applied to new situations. Specifically, once we have a set of

    events in which a corner move (t1) is appropriate, we want a description ot this set which includes all of

    the t1 events and no events trom the other three classes.

    tvfEL transtorms its knowledge using instance-to-class generalization. After a number of events

    have been collected, MEL uses program GEM to generate generalized rules for each event type. These

  • 8

    tl-event5 # c 51 52 53 54 55 56 s7 58 1 w e w w b e b b b 2 w e w w e w e e w 3 w e w w e b e e b

    t2-event5 # c 51 52 53 54 s5 56 s7 58 1 w w e w e b e b e 2 w w e e b w e e w 3 w w e w w b w e e 4 w e e e e w e e w

    t3-events # c 51 52 53 54 55 56 57 58 1 w e e e e w w e e

    t4-events # c 51 52 53 s4 55 s6 s7 s8 1 w w b b e w e w e 2 w w b w e w e e e

    Figure 2. Example GEM input.

    rules describe the events of each move type and distinguish them from the events of the other three

    move types.

    An example will make this clear. Figure 2 shows GEM input in the form of a relational table. The

    table represents a collection of events of type placing, grouped by move type. Each row is an event and

    each column contains values for a particular attribute. The attributes in the column labeled c are the

    colors of the moving player (white or black). The other attributes are the colors (white. black or empty)

    of the board junctions. The events in this example all show white placing to complete a mill (the bold

    faced attributes). Thus, event #1 of the tl even.ts shows white moving to junction 1 when junctions 2

    and 3 are white (which completes a white mill at 1-2-3). Only the first eight board junctions are shown

  • although in actual practice all 24 would be present.

    Figure 3 shows some possible GEM output ror the example in Figure 2. A set or disjuntive com

    pleus form the descriptions ror each class. The complexes are formed from conjunctive selectors (the

    bracketed expressions). Thus, the description of a t2 move in English is: (1) Junction 2 is empty and

    junctions 1 and 3 are white, or (2) junction 2 is empty and junctions 5 and 8 are white.

    The classification rules produced by GEM are generalizations because they describe more situations

    than did the original events. The salient features of t1 events, according to GEM, are that junction 1 is

    empty and junctions 2 and 3 are white. In addition, this description does not describe events or any

    other move type. If a new event is encountered which fits this generai description, the general rule can

    be used to decide that a move to junction 1 is appropriate. If a new event is encountered which (accord

    ing to the rules) belongs to more than one class, the rules are too general. If the new event is described

    tl-outhypo #: cpx 1 [sl=eJ [s2=w] [s3=w]

    t2-outhypo #: cpx 1 [sl=w] [s2=e] [s3=w] 2 [s2=e] [s5=w] [s8=w]

    t3-outhypo #: cpx 1 !s4=e] [s5=w]ls6=w]

    t4-outbypo #: cpx 1 [s2=wJ [s5=e] [s8=w] 2 [s2=wl [s5=eJ [s8=wl

    Figure 3. Example GEM output.

  • 10

    by none of the rules, the rules are too specific. In either case, the new event should be added to the

    database of events and GEM rerun.

    ~1EL does not use GEM output directly. The rules are translated into a single Prolog statement.

    The Prolog statement is a representation of an optimal binary tree Cor evaluating which complexes are

    satisfied (and thus which move to make). The nodes of the tree are the selectors and the branches

    represent whether or not the node was satisfied. The selector which appears at the top oC any subtree is

    the selector which appears in the most complexes represented by that subtree. Therefore, the root node

    of the tree (which represents all complexes) is the selector which appears most often. In the example, the

    root node is the selector [s2=w] since it appears in three complexes. The left subtree under the root

    represents the rules in which [s2=w] appears and the right subtree those in which it does not appear.

    With this arrangement, the number or times each complex must be evaluated is minimized. The learned

    rules apply to a board in the normal orientation. The board may need to be reeoriented to check for all

    rules which can be satisfied.

    4. GEM

    GEM (or more precisely GEMl.O [Reinke 84]) is the latest in a series oC induction programs

    developed by the Intelligent Systems Group at the University of Illinois at Urbana-Champaign. The AQ

    algorithm is at the heart of the various versions or GEM. Briefly, the AQ algorithm produces descrip

    tions of classes oC events. Each event is a vector of attribute values. The attribute values are discrete

    and belong to finite domains. For an easy to understand explanation oC how the algorithm works, con

    sult [Reinke 841. For a detailed theoretical discussion see [Michalski 75].

    GEM provides input and output Cor AQ which is geared ror use by a knowledge engineer. As men

    tioned earlier, GEM input is in the Corm oC relational tables and output is in the Corm oC variable-valued

    logic expressions. Additional input information regarding attributes and their domains can be provided.

    GEM allows a cost to be associated with each attribute. This can be viewed as the cost for evaluating

    the attribute or as a measure or importance oC an attribute, a lower value implying less cost or more

  • 11

    importance. LEF's (Lexicographical Evaluation Functions) tell GEM what type of complexes to form.

    An LEF can specify short complexes, long complexes, or complexes utilizing the most important attri

    butes. ~1EL takes advantage of this and places greater importance on the junctions which are closest to

    the normalized junctions 1, 2, 4 and 5. This provides a focus of attention in the rules neat the move

    junctions. The attributes which are cheapest to evaluate are the color of the moving player and the

    "from" junction of a moving or flying move. ~L uses LEF's which dictate short, cheap complexes con

    sisting of junctions near where a move is to be made.

  • 12

    APPENDIX A

    USER'S GUIDE

    MEL is written in UNSW Prolog [Sammut 83j and uses PROLOGRAPHICS [Channic 83] (or

    graphics on a Sun Microsystems display. A stripped-down non-interactive version with no display was

    used (or lengthy batch runs.

    MEL is loaded (rom a window on the sun by entering:

    sprolos -s300000 sunmlll [event-file] [rule-file]

    The request for such a huge amount of stack space (-s300000) is unnecessary i( no GEM output will be

    translated to Prolog. A request of 50000 units will suffice if no translation is to be done. Due to the

    program's size, MEL takes about a minute to load. Optional files for events and learned rules may fol

    low the program file and will extend the load time.

    ~10st input is accomplished through a three-button mouse. Prompts are provided as to which but

    ton to push. Multiple choice questions with more than three alternatives are selected from menus by

    positioning the mouse over the desired choice and pressing any mouse button. Moves are selected in a

    similar manner by positioning the pointer at or near the desired junction or piece. UNSW Prolog com

    mands must be entered (rom the keyboard. Remember that the sunwindow package requires the mouse

    pointer to be in the text window for keyboard input.

    After MEL has finished loading, you will be prompted for gprolog commands. The command to

    begin playing is:

    play-same!

    You will then be asked whether you want instructions. An affirmative answer scrolls a brief description

    of the game in the text window. The next thing to decide is wh at types of players should compete from

  • 13

    a menu of the six possible choices. Refer to the descriptions of the player types in Section 3.1.2 to help

    you decide. you decide. Finally, you will be prompted as to whether either player's moves should be

    recorded. Remember that the moves of player types random and experimenter cannot be recorded. Play

    begins after this information has been entered.

    If neither player requires human input, the game will progress on its own until one player wins or

    50 moves have been made by each side. Players requiring human input (types human and learner) indi

    cate their moves by placing the mouse pointer over the desired junction or piece and pressing a mouse

    button. If a player has a single choice of move, the move will be selected automatically. On any player's

    move, the single-item menu labeled "RETRACT MOVE" may be selected. This action returns the game

    to the state it was in before that player's last move. Events (regular and temporary) which were

    recorded for retracted moves are also erased. If the player was selecting the "to" portion of a moving or

    flying move, RETRACT MOVE returns the game to the start of the current move. After the game has

    concluded, you will again be prompted for gprolog commands.

    In addition to the regular UNSW Prolog commands for editing files, etc., MEL has several useCul

    commands which can be entered from the keyboard. They are listed below with a brief description.

    play...game! begin a game.

    stops a game (or any other action). This is not usually dangerous unless it is done while something is being drawn on the screen.

    learntng(yes)! gives you the menu Cor learning. See below for details.

    save_events(file)! saves all regular events in file.

    clear_events! removes all regular events.

    saves all temporary events in file.

    removes all temporary events.

    saveJules(file)! saves learned rules in file.

  • 14

    example! allows the input of examples. See below for details.

    To provide specific examples of play, enter (in the text window):

    example!

    You will be asked to select the game phase for which your example applies from a menu. After you have

    chosen the phase, a board will be drawn with a question mark at each junction. The question marks

    indicate that the color of that junction is irrelevant to the example and may have any value. Fill in the

    junctions that are relevant by positioning the mouse pointer at the junction and pressing button I for a

    white piece, 2 for an empty junction, and 3 for a black piece. The position which you create must have

    at least one legal move for the chosen phase. (For placing, moving or fiying, this means you must

    specify at least one empty junction.) You may change the colors of junctions as many times as you wish.

    When you are finished, select the single item menu labeled "END". If there is more than one legal move

    for the example you set up, you will be asked to input that move. This is done in the same way as a

    move in a game. You will then be asked whether another example is to be entered or to end example

    input.

    After MEL has acquired some examples of play either by observing some games or from specific

    examples you have provided, learned rules can be generated by entering:

    learning!

    You will be asked whether you want to perform batch or incremen tal learning. Batch learning generates

    learned rules from scratch. Incremental learning uses existing GEM output to aid in generating the new

    rules. If no GEM output exists, the two types of learning will work identically. If GEM output does

    exist, incremental learning will be much faster, especially if there are only a few new events since the last

    rule generation. If in doubt, use incremental learning. GEM output is stored in files of the form

    • ...,gemout where the "." is p, r, mI, m2, fl or f2. Everything else is automatic. MEL will keep you

    informed of what it is doing (running GEM, translating). To save learned rules enter:

  • 15

    s&veJules(filename)!

    where filename is the name of the file to which the learned rules are to be written. The Prolog version

    of a learned rule is highly unreadable. Use the GEM output directly to understand what the rules mean.

    The program consists of 21 modules, each consisting of about 100 lines of code. The names of the

    modules and a brief description of each are listed below.

    Module experiment contains the routines for conducting experiments in play. It contains routines

    for manipulating temporary events (t_events) and recognizing good positions. The threshhold value for

    the recording of temporary events is defined here and may be changed. Three definitions of good posi

    tions are given, but the user may wish to add others.

    Module gemsetup contains the routines for setting up and running GEM. It processes events from

    the database into the relational table format which is the input to GEM. It also contains the routines

    for writing learned rules to a file.

    Module integrity is responsible for maintaining the integrity of the database of events. EH'nts

    must belong to a single class and must be unique.

    Module miscl contains various low-level routines concerned with list manipt»ation. Module misc2

    contains other low-level routines which are used by other modules.

    Module orient contains definitions of the sixteen possible board orientations with routines (or

    changing from one orientation to another.

    Module recordl contains routines for maintaining the database of events including writing events

    to a file, erasing ev~nts and recording events. The higher level functions of deciding whether an event

    should or should not be recorded are in modules integrity and sunrecord.

    Module rules contains the hard-coded rules used by the machine player. Rules for moving at ran

    dom and using human input are also expressed by rules in this module.

  • 16

    Module stl'uctl contains the definitions of basic board structures (junction, row, mill) used by

    other modules. Module stl'uet2 contains higher level structures used only by the machine player.

    :t-.10dule sundl'aw contains the routines for drawing and undrawing screen objects via the PROLO·

    GRAPHICS package. It also contains routines for converting screen coordinates to screen object

    numbers.

    Module sunexample contains routines for accepting and processing example moves.

    Module sungame contains the routines for conducting a game. These include routines for initializ·

    ing, moving and determining when a game is over.

    Module sunhuman is responsible for accepting and checking human input via the mouse such as

    moving and selection from menus. It also contains routines for adjusting the game when a move has

    been retracted.

    Module sunleal'n contains the routines for controlling batch and incremental learning. The actual

    routines for running GEM and interpreting GEM output are in modules gemsetup and translate respec

    tively.

    Module sunmillioads all of the other modules and prints the introductory message.

    Module sunmove contains the routines for controlling move generation. It also handles removing

    pieces and tracking moves for possible retraction.

    Modules sunobjl and sunobj2 contain definitions of screen objects. In general, objects defined in

    sunobjl are simpler than those defined in sunobj2. Objects in sunobjl are used in defining objects in

    sunobj2.

    Module sunl'ecol'd controls the recording of moves. It also informs the user of inconsistent moves

    when they are made. The actual checking, erasing and recording of the moves is done in modules

    record 1 and integrity.

  • 17

    Module translate processes GEM output in the form of variable-valued logic expressions into a

    Prolog statement which can be used by MEL. It contains routines for parsing GEM output, translating

    selectors and arranging the translated selectors in an optimal tree structure.

    Module valid contains the definitions or valid moves.

    Prolog is nearly self-documenting and the interested reader is invited to examine the program list

    ing for specifics on MEL's operation. Every attempt has been made to make variable and predicate

    names mnemonic.

    MEL uses several external files. Binary files eX8creen and jscreen contain definitions of the exam

    ple screen and the labeled board junctions, respectively. They can be generated by MEL dynamically.

    but this is slow. Events and their corresponding GEM rules are written to files of the form '*_et'enls

    and '*_gemou/, respectively, where '* is p, r, ml, m2, /1, or /2. Prolog learned rules are built in file

    lrule...file. When incremental learning is being performed, output from the previous invocation of GE~f

    is used as input to the current invocation and is stored in inhypo. File inhypoed contains the stream edi

    tor (sed) commands for changing GEM output to GEM input hypotheses. Files gemheadl and gemhead2

    contain static GEM input (everything but inhypo tables and event tables), for one and two part moves

    respectively.

  • 18

    REFERENCES

    Boulanger, A.B., "The Expert System Plant/CD: A Case Study in Applying the General Purpose Inference System ADVISE to Predicting Black Cutworm Damage in Corn," Report No. UIUCDCS-R-83-1134, Department of Compute Science, University or Illinois, 1983.

    Channic, T.,"PROLOGRAPHICS: A Graphics Interface for Prolog", unpublished, 1984.

    Hoff, W., Michalski, RS. and Stepp, RE., "INDUCE-2: A Program for Learning Structural Descriptions from Examples," Report No. UIUCDCS-F-83-904, Department of Computer Science, Vniversity of Illinois, 1983.

    Michalski, RS., "Discovering Classification Rules Vsing Variable-Valued Logic System VL ," Third1International Joint Conference on Artificial Intelligence, pp. 162-172, 1973.

    Michalski, RS., "Synthesis of Optimal and Quasi-Optimal Variable-Valued Logic Formulas", Proceedings of the IfJ75 International Symposium on Multiple Valued Logic, pp. 76-87, 1975.

    Michalski, RS., "A Theory and Methodology of Inductive Learning," Machine Learning, Michalski, RS., Carbonell, J. and Mitchell, T. (Eds.), pp. 83-134, Tioga, Palo Alto, CA, 1983.

    ~fichalski, RS. and Chilausky, RL., "Learning by Being Told and Learning from Examples: an Experimental Comparison of Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis," International Journal of Policy Analysis and Information Systems, Vol. 4, No.2, pp. 125-160, 1980.

    '\1ichie, D., "Experiments on the Mechanization of Game Learning," Computer Journal, VoL 25, No.1, pp. 105-112, 1982.

    Morehead, A.H. and Mott-Smith, G. (Eds.),Hoyle Up-to-date, Grosset & Dunlap, New York, NY, 1976.

    Quinlan, J.R, "Discovering Rules from Large Collections of Examples: A Case Study," Expert Systems in the Micro-Electronic Age, Michie, D. (Ed.), pp. 168-201, Edinburgh Vniversity Press, Edinburgh, 19i9.

    Quinlan, J.R, "Learning Efficient Classification Procedures and their Application to Chess End Games," Afachine Learning, Michalski, RS., Carbonell, J. and Mitchell, T. (Eds.), pp. 463-481, Tioga, Palo Alto, CA, 1983.

    Reinke, RE., "A Structured Black-to-Win Decision Tree for the Chess Endgame KP vs. KR (pa7)," Internal Report, Intelligent Systems Group, Department of Computer Science, University of Illinois. 1982.

    Reinke, RE., "Knowledge Acquisition and Refinement Tools for the ADVISE Meta-Expert System," M.S. Thesis, Department of Computer Science, University of Illinois, 1984.

    Scarne, J., Scarne's Encyclopedia of Games, pp. 532-533, Harper & Row, New York, NY, 1973.

    Shapiro, A. and Niblett, T., "Automatic Induction of Classification Rules for a Chess Endgame," Advances in Computer Chess, Clarke, M.R. (Ed.), Edinburgh University Press, Edinburgh, 1982.

  • title • tnt 1 'p"

    tl.QQthJPo t Cpl 1 (~=b][s2:b] (54=e] [sS:eJ (sl=.,w] 2 [sh.] [she] [5I:e] (sh.1 {stl:w,.] 3 [51:.] [s2=w,b] {54=w,lI] [sS:w] ( [51=e] [53=b] [s6=e] [51=.1 [s2a:w,.J S [c:wJ(53=1] [sS.b] [56:1] {sl:.,wJ £ [s2=w,.] [5S=b] [lh.J 1 [51:1] {5(:w,b] [s5:wl [sl=.J

    [51=.1 [s6=el [51=w,1I1 [58=eJ

    , [51=1: [I(:W] [56=1,11] [sl=l]

    10 [51=e] [54=w] [sS:w] [sl=.,b]

    t2·oatbypo , cpa

    1 [sl=b] [5S:W] [s6=1I]

    1 (sl.e] [s4=',II) (sS •• ,b) [s6=w] [ill:.,bl

    3 [c.bl(sl:bJ [s5=w] [s6=e,b]

    4 [54=w,b] [sS:b) [s6=w,b) [sl=w,b]

    5 (12:.1 [5S:b] (56=e,w] [51:b]

    [c=b][sl=bJ [s2=ti [5S=w,lI] {s6=e]

    1 (52=.] [s3=e,w] (54=e] [sS.e] {s7=e] {s11:.] [slO=w,b]

    I [sl=.,.J [sl=.J Ishw1 [s6:.,b] [sl=w]

    , [sl=bJ [sl=e1 [sS:w] [s6=.,111

    10 {sl:w,b] [sl=.] [53=w,lI] {shbl [5he,wl

    11 [sl=.,IIJ [s4=1] [sS:w] [s6=.,b] {s7.e] [s9... b1 [s20=e]

    11 (c="'][51=e,IIJ [54=e] {15=W] {she,bl [s7.... ] £sl1=e] [db.1

    13 [51=.,b] [sh.] [54:11] [5"'W; {sbw,lI]

    14 {51=.J [53=w] [s(=.,.J [15=11] [s4=.,w] [s7=e] [s11=e]

    IS (sS"b] (shwl

    U-oathJpo • cpl

    1 [54:.] '[sS=wJ [Ihw]

    154=e] [sS=b] (shill

    3 [52=eJ (s4=e] {IS=.,W] [s6=w] {s?=w,~] [121=w}

    4 (she,v] {sh.] {5S::.l [s?:w,b] {shbl

    S [il".l [is=.] [s6=.J [sl=v,~l (511=w,bl

    [i2=w] [13=.,w] [54:.) [55=.1 [s1=.] {s8;e] (59=w,0] 1 [c=bH51=e,w] [54=.) (sS=vJ [s7=ll [she,bl [s9:wJ [s28:e] 8 t52:v] [54=.] [5S... 1 [16"w,01 [sl=e] , [11=e.w) [53=.,.) [14=.] [sS=w] [57=.] (sh.,b] [5hw,b] [slG=v,b]

    1D [c=wJ[51:w,bl [5%:W,O] [5S=V] [s6=.,w] [19".,v) (s11=.,b1

    11 (c=v]{s4=.] [5S=.,b1 [56=b] [sl=e] [520=.]

    12 [sl=w] [5S=W)

  • t4-olthfPO I cpa 1 l51:.1 [54: •• bJ [s5=11 [S'=IJ (57:IJ [sll:.1 (slD:.] 1 (s4=b] [55:11 (5.=&1 3 [sl=v] [55=1] (s8=w,bl 4 [51:w,bj tsl:l] [5S=I] (5'=IJ [58=e1 (520=.] S [5Z:e1 :54:e,bJ (s5=1] [s'=e1 [57:.l [51=11 {59:w.bJ (510=IJ

    {s4:wl [5S:I! [s6=.] [52:w.b1 [55=.] [sl=bl

    This ran DSld laillisftOnds of CPU tiat): 5yst.a tiat: 700

    180450

  • i

    BIBLIOGRAPHIC DATA 1. Report No. 3. Recipient's Accession No. SHEET 1 I~UIUCDCS-F-85-931

    14. T ule ana ;,uOtltle S. Report Date Janqruy 1985

    in Playing the Game of MILL

    MEL - A Learning Program that Improves by Experience

    6.

    7. Author(s) 8. Performing Organization Rept.No.

    P...a.ul ..Ho.f.£man 9. Performing Organization Name and Address 10. Project/Task/Work Unit No.

    Department of Computer Science 11. Contract/Grant No.University of Illinois NSF DCR 84-06801

    1304 W. Springfield Avenue ONR N00014-82-K-0186Urbana, IL 61801 12. Sponsoring Organization Name and Address 13. Type of Report & Period

    Covered Office of Naval Research National Science Foundation Arlington, VA Washington, DC

    15. Supplementary Notes

    16. Abstracts This paper describes a program able to learn how to play the board game MILL. The

    program, called MEL, acquires its knowledge in much the same way a human player would _ from a teacher, by observing games or by playing games itself. This knowledge is recorded in the form of examples of play. When a number of examples have been assimilated, MEL invokes program GEM to induce rules of play from the examples. The induced (or learned) rules are generalizations of the examples. Finally, MEL translates and reorganizes the learned rules so that they can be used by the program to play the game.

    17. Key ',l'ords and Document Analysis. 170. Descriptors

    Machine Learning

    Induction

    Game Playing

    Learning Rules from Examples

    Self-Improvement Programs

    17b. Identifiers/Open-Ended Terms

    17