differentialevolutionforthetunningofachessevaluationfunction

download differentialevolutionforthetunningofachessevaluationfunction

of 6

Transcript of differentialevolutionforthetunningofachessevaluationfunction

  • 7/27/2019 differentialevolutionforthetunningofachessevaluationfunction

    1/6

    A Differential Evolution for the Tuning of a Chess Evaluation

    Function

    Borko Bokovic, Sao Greiner, Janez Brest Member, IEEE and Viljem umer Member, IEEE

    Abstract We describe an approach for tuning a chessprogram evaluation function. The general idea is based on theDifferential Evolution algorithm. The tuning of the evaluationfunction has been implemented using only final outcomes ofgames. Each individual of the population represents a chessprogram with specific (different) parameters of its evaluationfunction. The principal objective is to ascertain fitness valuesof individuals in order to promote them into successive gen-erations. This is achieved by competition between individualsof two populations which also changes the individuals of bothpopulations. The preliminary results show that less generationsare necessary to obtain good (tuned) parameters. Acquiredresults have exhibited the fact that population individuals(vectors) are not as diverse as they would be if no changes

    were made during the competition.

    I. INTRODUCTION

    Game strategies and particularly chess have long beenan area of research in artificial intelligence. More thanthe intelligence of chess itself, the hardware and algorithmoptimization have improved. The first chess computer systemever to beat human world champion, Kasparov, was DeepBlue (1997). Its speed was about 200 million positions persecond.

    For the past 50 years the stress has been on improvingthe heuristic search techniques and evaluation functions. The

    evaluation function yields a static evaluation which is thenemployed in the search algorithm. Therefore the evaluationfunction is the most important part of a chess program.Efficiency of a chess program depends on evaluation func-tion speed and discovered knowledge. Evaluation functioncontains a lot of expressions and weights which were bothformed by human experts. But because discovering optimalweights is very difficult we need a better way to find them.One possible method is automated tuning or learning.When we talk about automated tuning in computer chess wekeep focus on algorithms such as hill climbing, simulatedannealing, temporal difference learning [1], [2] and evolu-tionary algorithms [7], [5]. All approaches enable tuning on

    the basis of programs own experiences or final result ofchess games competition: win, loss, or draw.Our experiment is based on principles of Darwinian evolu-

    tion. We used Differential Evolution (DE) [10] algorithm andthe idea of competition between the same chess programs butwith different weights. During the evolution process we takeinto account the outcome of a specific game and point score

    Borko Bokovic, Sao Greiner, Janez Brest, and Viljem umer are withthe Faculty of Electrical Engineering and Computer Science, University ofMaribor, Smetanova ul. 17, 2000 Maribor, Slovenia, (email:{borko.boskovic,saso.greiner, janez.brest, zumer}@uni-mb.si).

    of competition. This information is used in successive deci-sions as how to transform individuals between generationsas well as during single generation and ultimately decideswhich individuals will survive into the next generation.

    The article is structured as follows. Section II gives anoverview of work with dealing with tuning chess evaluationfunction. Section III shortly describes chess program whichwas tuned and its evaluation function. Section IV describesour evolutionary method for the tuning evaluation function.Section V presents two experiments with our method andtheir results. Section VI concludes the paper with finalremarks.

    II. RELATED WORK

    The pioneer of computer chess was Shannon (1949). Headvocated the idea that computer chess programs wouldrequire an evaluation function to successfully play a gameagainst human players [15]. In the beginning of computerchess the evaluation function was designed by hand by thedevelopers. To find good evaluation function the developersfirst had to test it by playing numerous games and thenmodify it according to the produced results. Because thiswas a recurring cycle the finding of a proper evaluationfunction was a difficult and very time consuming task. This

    is the main reason why researches have become involvedin finding a method to automatically improve parametersof the evaluation function. Samuel [14] developed checkerprogram that iteratively tuned the weights of the evaluationfunction. The idea was to reduce the difference of evaluationbetween predecessor and successors positions. This notionwas successfully applied to program NeuroChess [16] andKnightCap [1].

    The principles of evolution have also been used in the tun-ing of a chess evaluation function. Kendall and Whitwell [7]presented one such approach by using population dynamics.Fogel, Hays, Hahn and Quon [5] presented an evolutionaryalgorithm which has managed to improve a chess programby almost 400 rating points. All of these approaches usepopulation which consists of evaluation functions. As is thecase with evolutionary approaches, the offspring is generatedusing mutation, crossover, and selection operators.

    III. THE CHESS PROGRAM

    We use an evolutionary algorithm to tune the chess pro-gram that we developed recently. The chess program is veryfast and contains a simple evaluation function. The core ofour chess program presents the Bitboard game representation

    0-7803-9487-9/06/$20.00/2006 IEEE

    2006 IEEE Congress on Evolutionary ComputationSheraton Vancouver Wall Centre Hotel, Vancouver, BC, CanadaJuly 16-21, 2006

    1851

  • 7/27/2019 differentialevolutionforthetunningofachessevaluationfunction

    2/6

    and well known chess search algorithms such as alpha-beta with transposition table, quiescence search, null movepruning, aspiration search, and iterative deepening [6], [3].During the evolutionary process our chess program used thefollowing simplified version of evaluation function:

    eval = Xm(MwhiteMblack)+

    5y=0

    Xi(Ny,whiteNy,black)(1)

    In this equation Xi presents weights for all pieces withoutking and Xm the mobility weight. M presents mobility(number of available moves) for white or black pieces. Nyis the number of specific types of pieces (ie. the numberof white pawns). The principal reason for using such sim-ple and straightforward evaluation function is that we candemonstrate that the weight parameters of the function canbe tuned by applying the method we developed. In additionwe also present the behaviour and features of our method. Atthe beginning of the game, the program uses an opening book

    which includes knowledge that enables a program to playperfectly at the beginning of the game. When the programhas more options in opening book it randomly selects one.This is important because the tuning process tunes weightsaccording to all opening positions in the opening book.

    IV. METHOD

    Our method is based on Differential Evolution algorithm.DE is a floating-point encoding evolutionary algorithm forglobal optimization over continuous spaces [10], [8], [9],[13], [11]. Each population contains N P D-dimensionalvectors

    XG,i with parameter values that represent weights

    of chess evaluation function.

    XG,i = {XG,i,1, XG,i,2,...,XG,i,D}

    i = 1, 2,...,NP

    During the evolutionary process in each generation DEemploys mutation, crossover, and selection operations.Our method adds the idea of competition as shown in thealgorithm below. Competition is used to calculate fitness ofindividuals and additionally transform individuals for theirtuning.

    Initialization(P0);

    while(continue tuning) {

    PV = Mutation(PG, F);

    PU = Crossover(PG, PV, CR);

    Competition(PG, PU, N , L);

    PG+1 = Selection(PG, PU);

    }

    In algorithm P0 presents initial population, PV is a mutantpopulation, PU trial population and PG+1 population in nextgeneration. F, CR, N, and L are control parameters definedby user.

    A. Initialization

    At the beginning the population P0 is initialized withparameter values that are distributed uniformly randomlybetween parameter bounds (Xj,low, Xj,high; j = 1, 2,...,D).The bound values are problem specific.

    B. Mutation

    Mutation generates mutant population PV from currentpopulation PG using mutant strategy. For each vector fromcurrent population, mutation (using one of the mutationstrategies) creates a mutant vector

    VG,i which is an indi-

    vidual of mutant population.VG,i = {VG,i,1, VG,i,2,...,VG,i,D}

    i = 1, 2,...,NP

    DE includes various mutation strategies for global optimiza-tion. In our approach we used the rand/2 mutation strategywhich is given by the equation

    VG,i = XG,r1+F(XG,r2XG,r3)+ F(XG,r4XG,r5)(2)

    The indexes r1, r2, r3, r4, r5 are random and mutelydifferent integers generated within the range [1, N P] and alsodifferent from index i. F is a mutation scale factor withinthe range [0, 2] but usually less than 1.0.

    C. Crossover

    After mutation, a binary crossover forms a trial pop-ulation PU. According to i-th population vector and itscorresponding mutant vector, crossover creates trial vectorsUG,i with the following rule:

    UG,i = {UG,i,1, UG,i,2,...,UG,i,D}

    UG,i,j =

    VG,i,j, randj(0, 1) CR or j = jrand,

    XG,i,j, otherwise.

    i = 1, 2, ..., N P j = 1, 2,...,D

    CR is a crossover factor within the range [0,1) and is theprobability to create parameters of trial vector from mutantvector. Index jrand is a randomly chosen integer within therange [1, N P] and is responsible that the trial vector containsat least one parameter from mutant vector. After crossoverthe parameters of trial vector may be out of bounds (Xj,low,Xj,high). In this case the parameters can be mapped inside

    interval, set to bounds or used as they are out of bounds.D. Competition

    When we have a trial population PU we have to evaluate itsindividuals with competition between individuals of currentand trial population. Members collect points which representfitness value. Each individual of current population playsa specific number of games (N) against randomly chosenindividuals of the trial population. An individual gets 2 pointsfor winning, 1 for draw, and 0 for losing. An individual winswhen his opponent is in mat position. Game is a draw if

    1852

  • 7/27/2019 differentialevolutionforthetunningofachessevaluationfunction

    3/6

    position is a known draw position or if the same position isobtained three times in one game or because of 50-move role.Games are limited to 150 moves for both players. Therefore,if the game has 150 moves, the result is a draw. An individualloses if its opponent wins. After each game, the vector thatloses is transformed according to the following rule:

    Xloser,j = Xloser,j + L rand(0, 1) (Xwinner,j Xloser,j)

    j = 1, 2,...,D

    where Xwinner presents a vector which won and Xlosera vector which lost the game. Parameter L is a learningparameter within the range [0,1) and is responsible forreducing the distance between winning and losing vectors.

    E. Selection

    The selection operation selects according to collectedpoints from competition between i-th population vector andits corresponding trial vector. Selection dictates which vectorwill survive into next generation. In our case of a maximiza-tion problem we used the following selection rule:

    XG+1,i =

    UG,i, points(

    UG,i) > points(

    XG,i),

    XG,i, otherwise.

    i = 1, 2,...,NP

    V. EXPERIMENT

    In the first experiment we tuned weight of pieces andmobility as shown in (1) without tuning during competition(L=0.0). Population size N P was 10 because larger N Pwould substantially increase the number of required games

    in one generation. The number of games N played pereach current population vector was also 10. The scale factorF was set to 0.5 and the crossover factor CR was set to0.9. These values for parameters CR and F were takenfrom literature where they are considered to be good forglobal optimization. Mutation strategy was rand/2 becausechoosing other strategy for such small population couldlead into local optimum. Initialization was with randomuniformity between bounds Xj,low = 0 and Xj,high = 1000.Pawn weight was fixed to 100. If parameters are out ofbounds after crossover we set them to bound values. Averagevalues of initialized generation were:

    Xpawn = 100 Xrook = 645Xknight = 542 Xqueen = 463Xbishop = 417 Xmobility = 495

    Standard deviation of parameters were:

    pawn = 0 rook = 316knight = 316 queen = 282bishop = 225 mobility = 265

    During 50 generations 5000 games were played with searchalgorithm going to depth of 4 ply. Average values andstandard deviation of parameters are shown in Figures 1

    and 2. At the end of experiment we got population with thefollowing average parameter values:

    Xpawn = 100 Xrook = 440Xknight = 264 Xqueen = 920Xbishop = 312 Xmobility = 7

    and its corresponding standard deviations:

    pawn = 0 rook = 171knight = 98 queen = 58bishop = 99 mobility = 4

    Acquired results present relation of pieces values and mobil-ity which are in accordance with known values from chesstheory.

    In the second experiment we changed learning parameterL to 0.25 and scaling factor F to 1.5. Parameter L enablesadditional tuning of parameters during the competition andis responsible for convergence of poor individuals to theones with better parameters. Because L causes quickerconvergence we use parameter F, which diversifies vectors,

    as a remedy against local optima. Remaining parameterswere the same as in the first experiment. Average values ofinitialized generation were:

    Xpawn = 100 Xrook = 410Xknight = 796 Xqueen = 595Xbishop = 677 Xmobility = 520

    and its corresponding standard deviations:

    pawn = 0 rook = 272knight = 117 queen = 281bishop = 253 mobility = 269

    After 50 generations we got population with the followingaverage parameter values:

    Xpawn = 100 Xrook = 488Xknight = 247 Xqueen = 801Xbishop = 293 Xmobility = 7

    and its corresponding standard deviations:

    pawn = 0 rook = 18knight = 21 queen = 68bishop = 16 mobility = 1

    Average values of parameters and their corresponding stan-dard deviations during 50 generations are shown in figures

    3 and 4.According to the acquired results we can see that thetuning process during competition in the final populationhas smaller standard deviation. Figures 14 depict standarddeviations and average parameter values which additionallyindicate that DE with the tuning process during thecompetition has better convergence. To examine thiswe compared relative efficiency of best individual ineach generation for both experiments with the followingparameters taken from chess theory:

    1853

  • 7/27/2019 differentialevolutionforthetunningofachessevaluationfunction

    4/6

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    0 10 20 30 40 50

    AverageParameterV

    alue

    Generation

    mobilityknight

    bishoprook

    queen

    Fig. 1. Average parameter values of populations during 50 generations without tuning during competition

    0

    50

    100

    150

    200

    250

    300

    350

    400

    0 10 20 30 40 50

    StandardDe

    viation

    Generation

    mobilityknight

    bishoprook

    queen

    Fig. 2. Standard deviation of parameters in populations during 50 generations without tuning during competition

    Xpawn = 100 Xrook = 500Xknight = 300 Xqueen = 900Xbishop = 330 Xmobility = 10

    Relative efficiency was measured according to the resultsof 100 games between best individuals parameters andaforementioned parameters. Both players played 50 gamesas white and 50 games as black player.

    To make the results more representative both experimentswere repeated 10 times. Approximated average values ofrelative efficiency for both experiments are depicted in Figure5. Both experiments started out with identical population butrelative efficiency of our approach, as seen from the figure,is better than DE.

    1854

  • 7/27/2019 differentialevolutionforthetunningofachessevaluationfunction

    5/6

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    0 10 20 30 40 50

    AverageParameterV

    alue

    Generation

    mobilityknight

    bishoprook

    queen

    Fig. 3. Average parameter values of populations during 50 generations with tuning during competition

    0

    50

    100

    150

    200

    250

    300

    0 10 20 30 40 50

    StandardDe

    viation

    Generation

    mobilityknight

    bishoprook

    queen

    Fig. 4. Standard deviation of parameters in populations during 50 generations with tuning during competition

    VI. CONCLUSIONS

    We have proposed a method for the tuning of evaluationfunction which is based on Differential Evolution algorithm.We chose a DE/rand/2/bin strategy which, during 50 gen-erations, yielded a population with good parameters. Becausethese parameters have a relatively large standard deviation wedecided to apply additional tuning during the competition in

    order to reduce standard deviation and improve convergence.

    Results show that the yielded evaluation function wassuccessfully improved over initial population in both exper-iments. The weights of the evaluation function correspondto known values from chess theory. The second experimentused tuning between competitions which allowed the DE toobtain good parameters faster than in the first experiment.

    1855

  • 7/27/2019 differentialevolutionforthetunningofachessevaluationfunction

    6/6

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    0 10 20 30 40 50

    RelativeEfficiency

    Generation

    DE (F=1.5, L=0.25)DE (F=0.5, L=0.00)

    Fig. 5. Average relative efficiency for 10 runs between DE with andwithout learning during competition in evolutionary process. 100 gameswere played between best individuals in each generation with parameterstaken from chess theory

    In our experiments we used a simple evaluation function.

    This evaluation function enables the comparison of resultswith known good parameters from chess theory. This waywe can conclude that our method enables tuning of chessevaluation function only according to final outcomes ofgames. Therefore in the future we will try to develop a strongchess program with more complex evaluation function andour evolutionary method. Because our method is, similar toDE, very sensitive to control parameters, we will also focuson self-adapting [4], [12] DE algorithm for the tuning ofchess evaluation function.

    REFERENCES

    [1] Jonathan Baxter, Andrew Tridgell, and Lex Weaver. Experiments in

    parameter learning using temporal differences. International ComputerChess Association Journal, 21(2):8499, 1998.

    [2] Jonathan Baxter, Andrew Tridgell, and Lex Weaver. Learning to playchess using temporal differences. Machine Learning, 40(3):243263,2000.

    [3] B. Bokovic, S. Greiner, J. Brest, and V. umer. The representationof chess game. In Proceedings of the 27th International Conferenceon Information Technology Interfaces, pages 381386, 2005.

    [4] J. Brest, S. Greiner, B. Bokovic, M. Mernik, and V. umer. Self-Adapting Control Parameters in Differential Evolution: A ComparativeStudy on Numerical Benchmark Problems. IEEE Transactions on

    Evolutionary Computation. Accepted.

    [5] David B. Fogel, Timothy J. Hays, Sarah L. Hahn, and James Quon.A self-learning evolutionary chess program. Proceedings of the IEEE,92(12):19471954, 2004.

    [6] E. A. Heinz. Scalable Search in Computer Chess: AlgorithmicEnhancements and Experiments at High Search Depths. MorganKaufmann Publishers, 1999.

    [7] Graham Kendall and Glenn Whitwell. An evolutionary approach forthe tuning of a chess evaluation function using population dynamics.In Proceedings of the 2001 Congress on Evolutionary Computa-tion CEC2001, pages 9951002, COEX, World Trade Center, 159Samseong-dong, Gangnam-gu, Seoul, Korea, 27-30 2001. IEEE Press.

    [8] J. Liu and J. Lampinen. Adaptive Parameter Control of DifferentialEvolution. In Proceedings of the 8th International Conference on SoftComputing (MENDEL 2002), pages 1926, 2002.

    [9] J. Liu and J. Lampinen. On Setting the Control Parameter of theDifferential Evolution Method. In Proceedings of the 8th InternationalConference on Soft Computing (MENDEL 2002), pages 1118, 2002.

    [10] K. Price and R. Storn. Differential Evolution: A Simple EvolutionStrategy for Fast Optimization. Dr. Dobbs Journal of Software Tools,22(4):1824, April 1997.

    [11] K. V. Price, R. M. Storn, and J. A. Lampinen. Differential Evolution,A Practical Approach to Global Optimization. Springer, 2005.

    [12] A. K. Qin and P. N. Suganthan. Self-adaptive Differential EvolutionAlgorithm for Numerical Optimization. In The 2005 IEEE Congresson Evolutionary Computation CEC2005, volume 2, pages 17851791.IEEE Press, Sept. 2005. DOI: 10.1109/CEC.2005.1554904.

    [13] J. Rnkknen, S. Kukkonen, and K. V. Price. Real-Parameter Opti-mization with Differential Evolution, 2005.

    [14] A. L. Samuel. Some studies in machine learning using the game ofcheckers. IBM Journal of Research and Development, (3):211229,1959.

    [15] C. Shannon. Programming a computer for playing chess. PhilosophicalMagazine, 41(4):256, 1950.

    [16] Sebastian Thrun. Learning to play the game of chess. In G. Tesauro,

    D. Touretzky, and T. Leen, editors, Advances in Neural InformationProcessing Systems 7, pages 10691076. The MIT Press, Cambridge,MA, 1995.

    1856