Design of Evaluation Functions using Neural Networks in the Game of Go

25
2003/12/15 Hashimoto Tsuyoshi 1 Design of Evaluation Functions using Neural Networks in the Game of Go Presentation and translation: Hashimoto Tsuyoshi Authors: Hiroyuki Nagayoshi, Masaru Todoroki Department of Quantum Engineering and System Science,School of Engineering, The University of Tokyo

description

Design of Evaluation Functions using Neural Networks in the Game of Go. Presentation and translation: Hashimoto Tsuyoshi Authors: Hiroyuki Nagayoshi, Masaru Todoroki Department of Quantum Engineering and System Science,School of Engineering, The University of Tokyo. Background. - PowerPoint PPT Presentation

Transcript of Design of Evaluation Functions using Neural Networks in the Game of Go

Page 1: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 1

Design of Evaluation Functions using Neural

Networks in the Game of Go

Presentation and translation: Hashimoto Tsuyoshi

Authors: Hiroyuki Nagayoshi, Masaru TodorokiDepartment of Quantum Engineering and

System Science,School of Engineering, The University of Tokyo

Page 2: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 2

Background Go is the hardest game for

computers as following two reasons.

1. Search space is vast.2. Difficulty of evaluation

functions.We focus on problem 2.

Page 3: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 3

Importance of evaluation functions

If accurate evaluation functions are made…

It is possible to make strong programs even with shallow search.

Combining with best-first search can make search space smaller.

Page 4: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 4

Difficulty of static evaluation functions Chess ・・ losses and gains of pieces

have so strong correlation with positional judgment that accurate evaluation is possible.

Shogi ・・ Thinking losses and gains of pieces, mobility and consistency of castles, accurate evaluation is possible.

Go ・・ It is hard to evaluate Moyo or influence accurately. Life and death of stones is also difficult for evaluating without search.

Page 5: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 5

Current Go evaluation functions Life and death of stones +

influenceevaluation for uncertain territory like Moyo or influence is bad.

Learning by neural network  it is impossible to learn accurate

evaluations because of too many parameters, lack of considering symmetry.

Page 6: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 6

The goal

As evaluation functions of Go,

1. Share parameters

2. Use multi-layer neural network which units are locally connected

We show its validity by learning

using game records.

Page 7: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 7

Characteristic of this network Connection only with 3 x 3

neighborhood Equation among the same group Bypass between each inner layer

and input layer

Page 8: Design of Evaluation Functions using Neural Networks in the Game of Go

8

Structure of neural network

Input layer

Inner layer

Output layer

Probability to be white territory

Probability to be black territory

presence or absence of black stone

presence or absence of white stone

Page 9: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 9

Connection of units Connect with 36

units (3x3 neighborhoods

on under layer and input layer)

Describe influence of stones gradually spreading

Input layer

Inner layer

Right under inner layer

A unit

Page 10: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 10

Sharing parameters

Under layer

Input layer

Sharing parameters by positioning relation between units

Symmetric neural network

3 categories:1. Right under2. vertical and horizontal3. diagonal

Page 11: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 11

Sharing parameters

corneredge

center

•3 kinds of parameters ( corner, edge, center)

•Parameters are independent on board size

Page 12: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 12

Equation of output among the same group

•Stones belonging to the same group =The same life and death=The same outputs are desirable

Equation realizes the same outputs! Input

positionStructure of the group

Page 13: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 13

Effect of output equation among the same group

Equation decreases learning errors

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5

Without equationWith equation

Numbers of input layers

learn

ing

erro

rs

Page 14: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 14

Training of network Training of network has been done

by self-play learning like TD-learning

No good resultsThe reason ・・・ programs are too

weak?   Here we use game records of

professional players!

Page 15: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 15

Describe of positions at input layer

input layer

Shape input layer

black white

・・・1・・・0

Page 16: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 16

Training data

・・・ 1

・・・ 0Game-end position

Teacher data

Input position

Black territory

White territory

Page 17: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 17

Speed up learning multi-layer neural network = simple back propagation causes

considerably slow learning speed

Here we implement learning by quasi Newton method which is a method for non-linear optimization

Page 18: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 18

Effect of quasi Newton method

The quasi Newton method decreases learning errors faster than the steepest descent method

0.1

1

10

100

0 200 400 600 800 1000

steepest descent methodquasi Newton method

Iteration numbers

learn

ing

erro

rs

Page 19: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 19

Learning at end positions 100 end positions extracted from

game records, 80 positions are data for training and 20 positions are data for verification

Numbers of inner layers are 1 to 6, an iteration number for learning is 10000 times

Page 20: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 20

Results

no over-fitting

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6

Learning errorPrediction error

Numbers of input layers

Erro

rs per a

positio

n

Page 21: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 21

Results

+1

-1

0

2 Inner layers 6 Inner layers

Page 22: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 22

Learning of probability to be territories

50 game records, 30 are for learning, 20 are for verification

Compare estimated probabilities with posterior probability

Page 23: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 23

Results

0

20

40

60

80

100

5 101520253035404550556065707580859095100

Predicted probability(%)

0

20

40

60

80

100

5 101520253035404550556065707580859095100

Data for learning Data for verification

Sta

tistical

pro

bab

ility(%

)

Sta

tistical

pro

bab

ility(%

)

Predicted probability(%)

Page 24: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 24

Current problems Assessment of life and death is not

proper. One of the reason is too few game records was used for learning.

The number of liberties or eyes may be necessary for the input of network.

Page 25: Design of Evaluation Functions using Neural Networks in the Game of Go

2003/12/15 Hashimoto Tsuyoshi 25

Summary

We proposed a multi-layer neural network evaluation function. The features of our neural network are local connection of its neural units and sharing parameters for considering invariance in Go positions. Using game records, we obtain good learning results for end positions and probability predicting territories.