differentialevolutionforthetunningofachessevaluationfunction
Transcript of differentialevolutionforthetunningofachessevaluationfunction
-
7/27/2019 differentialevolutionforthetunningofachessevaluationfunction
1/6
A Differential Evolution for the Tuning of a Chess Evaluation
Function
Borko Bokovic, Sao Greiner, Janez Brest Member, IEEE and Viljem umer Member, IEEE
Abstract We describe an approach for tuning a chessprogram evaluation function. The general idea is based on theDifferential Evolution algorithm. The tuning of the evaluationfunction has been implemented using only final outcomes ofgames. Each individual of the population represents a chessprogram with specific (different) parameters of its evaluationfunction. The principal objective is to ascertain fitness valuesof individuals in order to promote them into successive gen-erations. This is achieved by competition between individualsof two populations which also changes the individuals of bothpopulations. The preliminary results show that less generationsare necessary to obtain good (tuned) parameters. Acquiredresults have exhibited the fact that population individuals(vectors) are not as diverse as they would be if no changes
were made during the competition.
I. INTRODUCTION
Game strategies and particularly chess have long beenan area of research in artificial intelligence. More thanthe intelligence of chess itself, the hardware and algorithmoptimization have improved. The first chess computer systemever to beat human world champion, Kasparov, was DeepBlue (1997). Its speed was about 200 million positions persecond.
For the past 50 years the stress has been on improvingthe heuristic search techniques and evaluation functions. The
evaluation function yields a static evaluation which is thenemployed in the search algorithm. Therefore the evaluationfunction is the most important part of a chess program.Efficiency of a chess program depends on evaluation func-tion speed and discovered knowledge. Evaluation functioncontains a lot of expressions and weights which were bothformed by human experts. But because discovering optimalweights is very difficult we need a better way to find them.One possible method is automated tuning or learning.When we talk about automated tuning in computer chess wekeep focus on algorithms such as hill climbing, simulatedannealing, temporal difference learning [1], [2] and evolu-tionary algorithms [7], [5]. All approaches enable tuning on
the basis of programs own experiences or final result ofchess games competition: win, loss, or draw.Our experiment is based on principles of Darwinian evolu-
tion. We used Differential Evolution (DE) [10] algorithm andthe idea of competition between the same chess programs butwith different weights. During the evolution process we takeinto account the outcome of a specific game and point score
Borko Bokovic, Sao Greiner, Janez Brest, and Viljem umer are withthe Faculty of Electrical Engineering and Computer Science, University ofMaribor, Smetanova ul. 17, 2000 Maribor, Slovenia, (email:{borko.boskovic,saso.greiner, janez.brest, zumer}@uni-mb.si).
of competition. This information is used in successive deci-sions as how to transform individuals between generationsas well as during single generation and ultimately decideswhich individuals will survive into the next generation.
The article is structured as follows. Section II gives anoverview of work with dealing with tuning chess evaluationfunction. Section III shortly describes chess program whichwas tuned and its evaluation function. Section IV describesour evolutionary method for the tuning evaluation function.Section V presents two experiments with our method andtheir results. Section VI concludes the paper with finalremarks.
II. RELATED WORK
The pioneer of computer chess was Shannon (1949). Headvocated the idea that computer chess programs wouldrequire an evaluation function to successfully play a gameagainst human players [15]. In the beginning of computerchess the evaluation function was designed by hand by thedevelopers. To find good evaluation function the developersfirst had to test it by playing numerous games and thenmodify it according to the produced results. Because thiswas a recurring cycle the finding of a proper evaluationfunction was a difficult and very time consuming task. This
is the main reason why researches have become involvedin finding a method to automatically improve parametersof the evaluation function. Samuel [14] developed checkerprogram that iteratively tuned the weights of the evaluationfunction. The idea was to reduce the difference of evaluationbetween predecessor and successors positions. This notionwas successfully applied to program NeuroChess [16] andKnightCap [1].
The principles of evolution have also been used in the tun-ing of a chess evaluation function. Kendall and Whitwell [7]presented one such approach by using population dynamics.Fogel, Hays, Hahn and Quon [5] presented an evolutionaryalgorithm which has managed to improve a chess programby almost 400 rating points. All of these approaches usepopulation which consists of evaluation functions. As is thecase with evolutionary approaches, the offspring is generatedusing mutation, crossover, and selection operators.
III. THE CHESS PROGRAM
We use an evolutionary algorithm to tune the chess pro-gram that we developed recently. The chess program is veryfast and contains a simple evaluation function. The core ofour chess program presents the Bitboard game representation
0-7803-9487-9/06/$20.00/2006 IEEE
2006 IEEE Congress on Evolutionary ComputationSheraton Vancouver Wall Centre Hotel, Vancouver, BC, CanadaJuly 16-21, 2006
1851
-
7/27/2019 differentialevolutionforthetunningofachessevaluationfunction
2/6
and well known chess search algorithms such as alpha-beta with transposition table, quiescence search, null movepruning, aspiration search, and iterative deepening [6], [3].During the evolutionary process our chess program used thefollowing simplified version of evaluation function:
eval = Xm(MwhiteMblack)+
5y=0
Xi(Ny,whiteNy,black)(1)
In this equation Xi presents weights for all pieces withoutking and Xm the mobility weight. M presents mobility(number of available moves) for white or black pieces. Nyis the number of specific types of pieces (ie. the numberof white pawns). The principal reason for using such sim-ple and straightforward evaluation function is that we candemonstrate that the weight parameters of the function canbe tuned by applying the method we developed. In additionwe also present the behaviour and features of our method. Atthe beginning of the game, the program uses an opening book
which includes knowledge that enables a program to playperfectly at the beginning of the game. When the programhas more options in opening book it randomly selects one.This is important because the tuning process tunes weightsaccording to all opening positions in the opening book.
IV. METHOD
Our method is based on Differential Evolution algorithm.DE is a floating-point encoding evolutionary algorithm forglobal optimization over continuous spaces [10], [8], [9],[13], [11]. Each population contains N P D-dimensionalvectors
XG,i with parameter values that represent weights
of chess evaluation function.
XG,i = {XG,i,1, XG,i,2,...,XG,i,D}
i = 1, 2,...,NP
During the evolutionary process in each generation DEemploys mutation, crossover, and selection operations.Our method adds the idea of competition as shown in thealgorithm below. Competition is used to calculate fitness ofindividuals and additionally transform individuals for theirtuning.
Initialization(P0);
while(continue tuning) {
PV = Mutation(PG, F);
PU = Crossover(PG, PV, CR);
Competition(PG, PU, N , L);
PG+1 = Selection(PG, PU);
}
In algorithm P0 presents initial population, PV is a mutantpopulation, PU trial population and PG+1 population in nextgeneration. F, CR, N, and L are control parameters definedby user.
A. Initialization
At the beginning the population P0 is initialized withparameter values that are distributed uniformly randomlybetween parameter bounds (Xj,low, Xj,high; j = 1, 2,...,D).The bound values are problem specific.
B. Mutation
Mutation generates mutant population PV from currentpopulation PG using mutant strategy. For each vector fromcurrent population, mutation (using one of the mutationstrategies) creates a mutant vector
VG,i which is an indi-
vidual of mutant population.VG,i = {VG,i,1, VG,i,2,...,VG,i,D}
i = 1, 2,...,NP
DE includes various mutation strategies for global optimiza-tion. In our approach we used the rand/2 mutation strategywhich is given by the equation
VG,i = XG,r1+F(XG,r2XG,r3)+ F(XG,r4XG,r5)(2)
The indexes r1, r2, r3, r4, r5 are random and mutelydifferent integers generated within the range [1, N P] and alsodifferent from index i. F is a mutation scale factor withinthe range [0, 2] but usually less than 1.0.
C. Crossover
After mutation, a binary crossover forms a trial pop-ulation PU. According to i-th population vector and itscorresponding mutant vector, crossover creates trial vectorsUG,i with the following rule:
UG,i = {UG,i,1, UG,i,2,...,UG,i,D}
UG,i,j =
VG,i,j, randj(0, 1) CR or j = jrand,
XG,i,j, otherwise.
i = 1, 2, ..., N P j = 1, 2,...,D
CR is a crossover factor within the range [0,1) and is theprobability to create parameters of trial vector from mutantvector. Index jrand is a randomly chosen integer within therange [1, N P] and is responsible that the trial vector containsat least one parameter from mutant vector. After crossoverthe parameters of trial vector may be out of bounds (Xj,low,Xj,high). In this case the parameters can be mapped inside
interval, set to bounds or used as they are out of bounds.D. Competition
When we have a trial population PU we have to evaluate itsindividuals with competition between individuals of currentand trial population. Members collect points which representfitness value. Each individual of current population playsa specific number of games (N) against randomly chosenindividuals of the trial population. An individual gets 2 pointsfor winning, 1 for draw, and 0 for losing. An individual winswhen his opponent is in mat position. Game is a draw if
1852
-
7/27/2019 differentialevolutionforthetunningofachessevaluationfunction
3/6
position is a known draw position or if the same position isobtained three times in one game or because of 50-move role.Games are limited to 150 moves for both players. Therefore,if the game has 150 moves, the result is a draw. An individualloses if its opponent wins. After each game, the vector thatloses is transformed according to the following rule:
Xloser,j = Xloser,j + L rand(0, 1) (Xwinner,j Xloser,j)
j = 1, 2,...,D
where Xwinner presents a vector which won and Xlosera vector which lost the game. Parameter L is a learningparameter within the range [0,1) and is responsible forreducing the distance between winning and losing vectors.
E. Selection
The selection operation selects according to collectedpoints from competition between i-th population vector andits corresponding trial vector. Selection dictates which vectorwill survive into next generation. In our case of a maximiza-tion problem we used the following selection rule:
XG+1,i =
UG,i, points(
UG,i) > points(
XG,i),
XG,i, otherwise.
i = 1, 2,...,NP
V. EXPERIMENT
In the first experiment we tuned weight of pieces andmobility as shown in (1) without tuning during competition(L=0.0). Population size N P was 10 because larger N Pwould substantially increase the number of required games
in one generation. The number of games N played pereach current population vector was also 10. The scale factorF was set to 0.5 and the crossover factor CR was set to0.9. These values for parameters CR and F were takenfrom literature where they are considered to be good forglobal optimization. Mutation strategy was rand/2 becausechoosing other strategy for such small population couldlead into local optimum. Initialization was with randomuniformity between bounds Xj,low = 0 and Xj,high = 1000.Pawn weight was fixed to 100. If parameters are out ofbounds after crossover we set them to bound values. Averagevalues of initialized generation were:
Xpawn = 100 Xrook = 645Xknight = 542 Xqueen = 463Xbishop = 417 Xmobility = 495
Standard deviation of parameters were:
pawn = 0 rook = 316knight = 316 queen = 282bishop = 225 mobility = 265
During 50 generations 5000 games were played with searchalgorithm going to depth of 4 ply. Average values andstandard deviation of parameters are shown in Figures 1
and 2. At the end of experiment we got population with thefollowing average parameter values:
Xpawn = 100 Xrook = 440Xknight = 264 Xqueen = 920Xbishop = 312 Xmobility = 7
and its corresponding standard deviations:
pawn = 0 rook = 171knight = 98 queen = 58bishop = 99 mobility = 4
Acquired results present relation of pieces values and mobil-ity which are in accordance with known values from chesstheory.
In the second experiment we changed learning parameterL to 0.25 and scaling factor F to 1.5. Parameter L enablesadditional tuning of parameters during the competition andis responsible for convergence of poor individuals to theones with better parameters. Because L causes quickerconvergence we use parameter F, which diversifies vectors,
as a remedy against local optima. Remaining parameterswere the same as in the first experiment. Average values ofinitialized generation were:
Xpawn = 100 Xrook = 410Xknight = 796 Xqueen = 595Xbishop = 677 Xmobility = 520
and its corresponding standard deviations:
pawn = 0 rook = 272knight = 117 queen = 281bishop = 253 mobility = 269
After 50 generations we got population with the followingaverage parameter values:
Xpawn = 100 Xrook = 488Xknight = 247 Xqueen = 801Xbishop = 293 Xmobility = 7
and its corresponding standard deviations:
pawn = 0 rook = 18knight = 21 queen = 68bishop = 16 mobility = 1
Average values of parameters and their corresponding stan-dard deviations during 50 generations are shown in figures
3 and 4.According to the acquired results we can see that thetuning process during competition in the final populationhas smaller standard deviation. Figures 14 depict standarddeviations and average parameter values which additionallyindicate that DE with the tuning process during thecompetition has better convergence. To examine thiswe compared relative efficiency of best individual ineach generation for both experiments with the followingparameters taken from chess theory:
1853
-
7/27/2019 differentialevolutionforthetunningofachessevaluationfunction
4/6
0
100
200
300
400
500
600
700
800
900
1000
0 10 20 30 40 50
AverageParameterV
alue
Generation
mobilityknight
bishoprook
queen
Fig. 1. Average parameter values of populations during 50 generations without tuning during competition
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50
StandardDe
viation
Generation
mobilityknight
bishoprook
queen
Fig. 2. Standard deviation of parameters in populations during 50 generations without tuning during competition
Xpawn = 100 Xrook = 500Xknight = 300 Xqueen = 900Xbishop = 330 Xmobility = 10
Relative efficiency was measured according to the resultsof 100 games between best individuals parameters andaforementioned parameters. Both players played 50 gamesas white and 50 games as black player.
To make the results more representative both experimentswere repeated 10 times. Approximated average values ofrelative efficiency for both experiments are depicted in Figure5. Both experiments started out with identical population butrelative efficiency of our approach, as seen from the figure,is better than DE.
1854
-
7/27/2019 differentialevolutionforthetunningofachessevaluationfunction
5/6
0
100
200
300
400
500
600
700
800
900
0 10 20 30 40 50
AverageParameterV
alue
Generation
mobilityknight
bishoprook
queen
Fig. 3. Average parameter values of populations during 50 generations with tuning during competition
0
50
100
150
200
250
300
0 10 20 30 40 50
StandardDe
viation
Generation
mobilityknight
bishoprook
queen
Fig. 4. Standard deviation of parameters in populations during 50 generations with tuning during competition
VI. CONCLUSIONS
We have proposed a method for the tuning of evaluationfunction which is based on Differential Evolution algorithm.We chose a DE/rand/2/bin strategy which, during 50 gen-erations, yielded a population with good parameters. Becausethese parameters have a relatively large standard deviation wedecided to apply additional tuning during the competition in
order to reduce standard deviation and improve convergence.
Results show that the yielded evaluation function wassuccessfully improved over initial population in both exper-iments. The weights of the evaluation function correspondto known values from chess theory. The second experimentused tuning between competitions which allowed the DE toobtain good parameters faster than in the first experiment.
1855
-
7/27/2019 differentialevolutionforthetunningofachessevaluationfunction
6/6
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50
RelativeEfficiency
Generation
DE (F=1.5, L=0.25)DE (F=0.5, L=0.00)
Fig. 5. Average relative efficiency for 10 runs between DE with andwithout learning during competition in evolutionary process. 100 gameswere played between best individuals in each generation with parameterstaken from chess theory
In our experiments we used a simple evaluation function.
This evaluation function enables the comparison of resultswith known good parameters from chess theory. This waywe can conclude that our method enables tuning of chessevaluation function only according to final outcomes ofgames. Therefore in the future we will try to develop a strongchess program with more complex evaluation function andour evolutionary method. Because our method is, similar toDE, very sensitive to control parameters, we will also focuson self-adapting [4], [12] DE algorithm for the tuning ofchess evaluation function.
REFERENCES
[1] Jonathan Baxter, Andrew Tridgell, and Lex Weaver. Experiments in
parameter learning using temporal differences. International ComputerChess Association Journal, 21(2):8499, 1998.
[2] Jonathan Baxter, Andrew Tridgell, and Lex Weaver. Learning to playchess using temporal differences. Machine Learning, 40(3):243263,2000.
[3] B. Bokovic, S. Greiner, J. Brest, and V. umer. The representationof chess game. In Proceedings of the 27th International Conferenceon Information Technology Interfaces, pages 381386, 2005.
[4] J. Brest, S. Greiner, B. Bokovic, M. Mernik, and V. umer. Self-Adapting Control Parameters in Differential Evolution: A ComparativeStudy on Numerical Benchmark Problems. IEEE Transactions on
Evolutionary Computation. Accepted.
[5] David B. Fogel, Timothy J. Hays, Sarah L. Hahn, and James Quon.A self-learning evolutionary chess program. Proceedings of the IEEE,92(12):19471954, 2004.
[6] E. A. Heinz. Scalable Search in Computer Chess: AlgorithmicEnhancements and Experiments at High Search Depths. MorganKaufmann Publishers, 1999.
[7] Graham Kendall and Glenn Whitwell. An evolutionary approach forthe tuning of a chess evaluation function using population dynamics.In Proceedings of the 2001 Congress on Evolutionary Computa-tion CEC2001, pages 9951002, COEX, World Trade Center, 159Samseong-dong, Gangnam-gu, Seoul, Korea, 27-30 2001. IEEE Press.
[8] J. Liu and J. Lampinen. Adaptive Parameter Control of DifferentialEvolution. In Proceedings of the 8th International Conference on SoftComputing (MENDEL 2002), pages 1926, 2002.
[9] J. Liu and J. Lampinen. On Setting the Control Parameter of theDifferential Evolution Method. In Proceedings of the 8th InternationalConference on Soft Computing (MENDEL 2002), pages 1118, 2002.
[10] K. Price and R. Storn. Differential Evolution: A Simple EvolutionStrategy for Fast Optimization. Dr. Dobbs Journal of Software Tools,22(4):1824, April 1997.
[11] K. V. Price, R. M. Storn, and J. A. Lampinen. Differential Evolution,A Practical Approach to Global Optimization. Springer, 2005.
[12] A. K. Qin and P. N. Suganthan. Self-adaptive Differential EvolutionAlgorithm for Numerical Optimization. In The 2005 IEEE Congresson Evolutionary Computation CEC2005, volume 2, pages 17851791.IEEE Press, Sept. 2005. DOI: 10.1109/CEC.2005.1554904.
[13] J. Rnkknen, S. Kukkonen, and K. V. Price. Real-Parameter Opti-mization with Differential Evolution, 2005.
[14] A. L. Samuel. Some studies in machine learning using the game ofcheckers. IBM Journal of Research and Development, (3):211229,1959.
[15] C. Shannon. Programming a computer for playing chess. PhilosophicalMagazine, 41(4):256, 1950.
[16] Sebastian Thrun. Learning to play the game of chess. In G. Tesauro,
D. Touretzky, and T. Leen, editors, Advances in Neural InformationProcessing Systems 7, pages 10691076. The MIT Press, Cambridge,MA, 1995.
1856