Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 1

Design of Evaluation Functions using Neural

Networks in the Game of Go

Presentation and translation: Hashimoto Tsuyoshi

Authors: Hiroyuki Nagayoshi, Masaru TodorokiDepartment of Quantum Engineering and

System Science,School of Engineering, The University of Tokyo


Background Go is the hardest game for

computers as following two reasons.

1. Search space is vast.2. Difficulty of evaluation

functions.We focus on problem 2.


Importance of evaluation functions

If accurate evaluation functions are made…

It is possible to make strong programs even with shallow search.

Combining with best-first search can make search space smaller.


Difficulty of static evaluation functions Chess ・・ losses and gains of pieces

have so strong correlation with positional judgment that accurate evaluation is possible.

Shogi ・・ Thinking losses and gains of pieces, mobility and consistency of castles, accurate evaluation is possible.

Go ・・ It is hard to evaluate Moyo or influence accurately. Life and death of stones is also difficult for evaluating without search.


Current Go evaluation functions Life and death of stones +

influenceevaluation for uncertain territory like Moyo or influence is bad.

Learning by neural network　 it is impossible to learn accurate

evaluations because of too many parameters, lack of considering symmetry.


The goal

As evaluation functions of Go,

1. Share parameters

2. Use multi-layer neural network which units are locally connected

We show its validity by learning

using game records.


Characteristic of this network Connection only with 3 x 3

neighborhood Equation among the same group Bypass between each inner layer

and input layer

8

Structure of neural network

Input layer

Inner layer

Output layer

Probability to be white territory

Probability to be black territory

presence or absence of black stone

presence or absence of white stone


Connection of units Connect with 36

units (3x3 neighborhoods

on under layer and input layer)

Describe influence of stones gradually spreading

Input layer

Inner layer

Right under inner layer

A unit


Sharing parameters

Under layer

Input layer

Sharing parameters by positioning relation between units

Symmetric neural network

3 categories:1. Right under2. vertical and horizontal3. diagonal


Sharing parameters

corneredge

center

•3 kinds of parameters ( corner, edge, center)

•Parameters are independent on board size


Equation of output among the same group

•Stones belonging to the same group =The same life and death=The same outputs are desirable

Equation realizes the same outputs! Input

positionStructure of the group


Effect of output equation among the same group

Equation decreases learning errors

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5

Without equationWith equation

Numbers of input layers

learn

ing

erro

rs


Training of network Training of network has been done

by self-play learning like TD-learning

No good resultsThe reason ・・・ programs are too

weak? 　 Here we use game records of

professional players!


Describe of positions at input layer

input layer

Shape input layer

black white

・・・１・・・０


Training data

・・・ 1

・・・ 0Game-end position

Teacher data

Input position

Black territory

White territory


Speed up learning multi-layer neural network = simple back propagation causes

considerably slow learning speed

Here we implement learning by quasi Newton method which is a method for non-linear optimization


Effect of quasi Newton method

The quasi Newton method decreases learning errors faster than the steepest descent method

0.1

1

10

100

0 200 400 600 800 1000

steepest descent methodquasi Newton method

Iteration numbers

learn

ing

erro

rs


Learning at end positions 100 end positions extracted from

game records, 80 positions are data for training and 20 positions are data for verification

Numbers of inner layers are 1 to 6, an iteration number for learning is 10000 times


Results

no over-fitting

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6

Learning errorPrediction error

Numbers of input layers

Erro

rs per a

positio

n


Results

+1

-1

0

2 Inner layers 6 Inner layers


Learning of probability to be territories

50 game records, 30 are for learning, 20 are for verification

Compare estimated probabilities with posterior probability


Results

0

20

40

60

80

100

5 101520253035404550556065707580859095100

Predicted probability(%)

0

20

40

60

80

100

5 101520253035404550556065707580859095100

Data for learning Data for verification

Sta

tistical

pro

bab

ility(%

)

Sta

tistical

pro

bab

ility(%

)

Predicted probability(%)


Current problems Assessment of life and death is not

proper. One of the reason is too few game records was used for learning.

The number of liberties or eyes may be necessary for the input of network.


Summary

We proposed a multi-layer neural network evaluation function. The features of our neural network are local connection of its neural units and sharing parameters for considering invariance in Go positions. Using game records, we obtain good learning results for end positions and probability predicting territories.

Design of Evaluation Functions using Neural Networks in the Game of Go

Documents

Transcript of Design of Evaluation Functions using Neural Networks in the Game of Go