Performance Analysis of Various Activation Functions in Neural
Design of Evaluation Functions using Neural Networks in the Game of Go
description
Transcript of Design of Evaluation Functions using Neural Networks in the Game of Go
2003/12/15 Hashimoto Tsuyoshi 1
Design of Evaluation Functions using Neural
Networks in the Game of Go
Presentation and translation: Hashimoto Tsuyoshi
Authors: Hiroyuki Nagayoshi, Masaru TodorokiDepartment of Quantum Engineering and
System Science,School of Engineering, The University of Tokyo
2003/12/15 Hashimoto Tsuyoshi 2
Background Go is the hardest game for
computers as following two reasons.
1. Search space is vast.2. Difficulty of evaluation
functions.We focus on problem 2.
2003/12/15 Hashimoto Tsuyoshi 3
Importance of evaluation functions
If accurate evaluation functions are made…
It is possible to make strong programs even with shallow search.
Combining with best-first search can make search space smaller.
2003/12/15 Hashimoto Tsuyoshi 4
Difficulty of static evaluation functions Chess ・・ losses and gains of pieces
have so strong correlation with positional judgment that accurate evaluation is possible.
Shogi ・・ Thinking losses and gains of pieces, mobility and consistency of castles, accurate evaluation is possible.
Go ・・ It is hard to evaluate Moyo or influence accurately. Life and death of stones is also difficult for evaluating without search.
2003/12/15 Hashimoto Tsuyoshi 5
Current Go evaluation functions Life and death of stones +
influenceevaluation for uncertain territory like Moyo or influence is bad.
Learning by neural network it is impossible to learn accurate
evaluations because of too many parameters, lack of considering symmetry.
2003/12/15 Hashimoto Tsuyoshi 6
The goal
As evaluation functions of Go,
1. Share parameters
2. Use multi-layer neural network which units are locally connected
We show its validity by learning
using game records.
2003/12/15 Hashimoto Tsuyoshi 7
Characteristic of this network Connection only with 3 x 3
neighborhood Equation among the same group Bypass between each inner layer
and input layer
8
Structure of neural network
Input layer
Inner layer
Output layer
Probability to be white territory
Probability to be black territory
presence or absence of black stone
presence or absence of white stone
2003/12/15 Hashimoto Tsuyoshi 9
Connection of units Connect with 36
units (3x3 neighborhoods
on under layer and input layer)
Describe influence of stones gradually spreading
Input layer
Inner layer
Right under inner layer
A unit
2003/12/15 Hashimoto Tsuyoshi 10
Sharing parameters
Under layer
Input layer
Sharing parameters by positioning relation between units
Symmetric neural network
3 categories:1. Right under2. vertical and horizontal3. diagonal
2003/12/15 Hashimoto Tsuyoshi 11
Sharing parameters
corneredge
center
•3 kinds of parameters ( corner, edge, center)
•Parameters are independent on board size
2003/12/15 Hashimoto Tsuyoshi 12
Equation of output among the same group
•Stones belonging to the same group =The same life and death=The same outputs are desirable
Equation realizes the same outputs! Input
positionStructure of the group
2003/12/15 Hashimoto Tsuyoshi 13
Effect of output equation among the same group
Equation decreases learning errors
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5
Without equationWith equation
Numbers of input layers
learn
ing
erro
rs
2003/12/15 Hashimoto Tsuyoshi 14
Training of network Training of network has been done
by self-play learning like TD-learning
No good resultsThe reason ・・・ programs are too
weak? Here we use game records of
professional players!
2003/12/15 Hashimoto Tsuyoshi 15
Describe of positions at input layer
input layer
Shape input layer
black white
・・・1・・・0
2003/12/15 Hashimoto Tsuyoshi 16
Training data
・・・ 1
・・・ 0Game-end position
Teacher data
Input position
Black territory
White territory
2003/12/15 Hashimoto Tsuyoshi 17
Speed up learning multi-layer neural network = simple back propagation causes
considerably slow learning speed
Here we implement learning by quasi Newton method which is a method for non-linear optimization
2003/12/15 Hashimoto Tsuyoshi 18
Effect of quasi Newton method
The quasi Newton method decreases learning errors faster than the steepest descent method
0.1
1
10
100
0 200 400 600 800 1000
steepest descent methodquasi Newton method
Iteration numbers
learn
ing
erro
rs
2003/12/15 Hashimoto Tsuyoshi 19
Learning at end positions 100 end positions extracted from
game records, 80 positions are data for training and 20 positions are data for verification
Numbers of inner layers are 1 to 6, an iteration number for learning is 10000 times
2003/12/15 Hashimoto Tsuyoshi 20
Results
no over-fitting
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6
Learning errorPrediction error
Numbers of input layers
Erro
rs per a
positio
n
2003/12/15 Hashimoto Tsuyoshi 21
Results
+1
-1
0
2 Inner layers 6 Inner layers
2003/12/15 Hashimoto Tsuyoshi 22
Learning of probability to be territories
50 game records, 30 are for learning, 20 are for verification
Compare estimated probabilities with posterior probability
2003/12/15 Hashimoto Tsuyoshi 23
Results
0
20
40
60
80
100
5 101520253035404550556065707580859095100
Predicted probability(%)
0
20
40
60
80
100
5 101520253035404550556065707580859095100
Data for learning Data for verification
Sta
tistical
pro
bab
ility(%
)
Sta
tistical
pro
bab
ility(%
)
Predicted probability(%)
2003/12/15 Hashimoto Tsuyoshi 24
Current problems Assessment of life and death is not
proper. One of the reason is too few game records was used for learning.
The number of liberties or eyes may be necessary for the input of network.
2003/12/15 Hashimoto Tsuyoshi 25
Summary
We proposed a multi-layer neural network evaluation function. The features of our neural network are local connection of its neural units and sharing parameters for considering invariance in Go positions. Using game records, we obtain good learning results for end positions and probability predicting territories.