2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer...

2004/11/13 GPW2004 1

What Shogi Programs Still Cannot Do- A New Test Set for Shogi -

Reijer Grimbergen and Taro Muraoka

Department of Informatics

Yamagata University

2004/11/13 GPW2004 2

Outline

The importance of testing

Test sets for chess

Test sets for shogi

A new test set for shogi

Problem area analysis

Some new results

Differences between humans and computers

Conclusions and future work

2004/11/13 GPW2004 3

The importance of testingGame programming

A program should play stronglyMore common is the reverse approach: minimize the number of bad moves

Testing can help determine problem areasIncremental testing

Save positions that the program did not handle wellDrawbacks

• Test set is program-specific• Positions selected subjectively

2004/11/13 GPW2004 4

The importance of testing

The requirements of a test setTesting a wide variety of potential problem areas

Not specific for one program

Test design in gamesMainly done for chess

Current test sets for shogi have shortcomings

Shogi research is at a point where focusing the effort could be a great help

Proposing a new test set for shogi

2004/11/13 GPW2004 5

Test sets for chessThe Bratko-Kopec test set

12 tactical positions and 12 strategic positionsDesigned to compare human and computer performance in chessThus far, no program can solve all positions

Reinfeld’s Win at chess300 tactical positionsUsed as a first test for new programs

LCT II35 positionsGood balance between strategic, tactical and endgame positionsAn ELO rating can be calculated from the solved positions

The Lindner test setA set of positions that are considered hard for computers to solve

2004/11/13 GPW2004 6

Test sets for shogiThe Matsubara-Iida test set

48 positions taken from professional gamesSelected by an expert playerAims at judging the strength of shogi programsFirst given to human players to establish a connection with playing strength

Problems with the Matsubara-Iida test setJudging programming strength can be established more accurately by playing on the internetNo ELO calculation like in LCT IISubjective selection leaves doubts about test balanceWhat is difficult for computers is not necessarily difficult for humans and vice versa, so connection with playing strength is unreliable

2004/11/13 GPW2004 7

Test sets for shogi

Other test sets for shogiYamashita’s test set (10 positions)

Tanase’s test set (19 positions)

Problems with these test setsToo small

Program specific

Unclear if there is only one solution

2004/11/13 GPW2004 8


What do we want from a test set?1. As general as possible

2. Points to as many problem areas as possible

Find positions that can not be solved by the best programs

Finding weaknesses instead of measuring strength

2004/11/13 GPW2004 9

A new test set for shogiPositions selected from Shukan Shogi

Every week six next-move problemsMiddle game positions and endgame positionsDifferent tactical themes: winning material, attack, defense and matingOur goal: create a test set of 100 positions

The programs we usedAI Shogi 2003Todai Shogi 5Gekisashi 2

Conditions30 seconds on 2 GHz Pentium 4

2004/11/13 GPW2004 10


This was not easy!More than 1500 positions needed to be checked to find our test set

Additional featureThe percentage of respondents who solved the problem is given

Differences between what is difficult for humans and difficult for computers

2004/11/13 GPW2004 11

Problem area analysisWhy are the positions difficult?

Using the analysis tools in Todai Shogi, Gekisashi and AI Shogi to find problem areas

Our first analysis indicates seven problem areasHorizon effect due to consecutive checksNot calling the tsume shogi solver deep in the search treeInaccurate evaluation functionIncorrect forward pruningMate with unpromoted piecesInsufficient hardware speedProblems with time allocation

2004/11/13 GPW2004 12

Problem area analysisHorizon effect and tsume shogi

Problem 750-3Solved: 16%

Solution2 四銀、 1 四玉（同歩、 2 三金、同玉、3 ニ角成）、 3 五金

Program repliesTodai: 1 五歩（敗勢）Gekisashi: 3 ニ角成（後手優勢）AI Shogi: 3 五金

2004/11/13 GPW2004 13


The problemHorizon checks after 2 四銀、 1四玉、 3 五金

The same position without horizon checks can be solved by all programs

2004/11/13 GPW2004 14


Another problem: tsume shogi deep in the search tree

Gekisashi with more time

2 四銀、 1 四玉、 3 五金、 7 九銀、同玉、2 五桂、 1 五歩、同馬、同銀（－ 1192 ）White has mate in 9 after 同玉 and black has a mate in 3 after 2 五桂 !

2004/11/13 GPW2004 15

Problem area analysisEvaluation and forward pruning


Solution2 二金、同金、 2 三角成、 3 三金、同馬

Program repliesTodai: 2 一角成、 4 一玉、 6 一金（勝勢）Gekisashi: 6 八銀、 5 六成銀、 3 七桂、 6 六銀、2 五桂、 5 四歩、 2 一角成、 4 一玉（先手勝勢）AI Shogi: 6 八銀、 5 八成銀、 2 一角成、 4 一玉

2004/11/13 GPW2004 16

Problem area analysisEvaluation and forward pruning

The problem: an incorrect evaluationAfter 2 一角成、 4 一玉 the white king can escape, but this can not be assessed

Evaluating the chances of escaping an attack is difficult?

Another problem: forward pruningConsecutive sacrifices 2 二金 and 2 三角成Multiple sacrifices not searched deep enough?

2004/11/13 GPW2004 17

Problem area analysisUnpromoted pieces


Solution1 三歩不成、 2 六銀直、（ 1 四歩は反則） 1 四玉

Program repliesTodai: 5 二と（敗勢）Gekisashi:8 四桂（後手勝勢）AI Shogi: 投了 (!)

2004/11/13 GPW2004 18

Problem area analysisUnpromoted pieces

The problem here seems a special case of forward pruningPromoting a major piece or a pawn is almost always better than not promoting

Non-promotions of these pieces are pruned to improve search efficiency

Not a high priority problem, but could have consequences for thinking in opponent time

When there is no difference between promoting and non-promoting a piece, non-promoting makes thinking in opponent time useless

My advice : play the non-promotion to win some time!

2004/11/13 GPW2004 19

Problem area analysisOther problem areas

Insufficient hardware speedSome positions could be solved by giving the program more timeImproved hardware speed will automatically solve these positions

Time allocationIn some positions, the programs would play very quicklyThese positions were deleted from our test setHowever, it might be a different problem area: when to cut off the search?

2004/11/13 GPW2004 20

Problem area analysisOverview

Problem Area Positions

Insufficient hardware speed 31

Inaccurate evaluation function 20

Incorrect forward pruning 19

Horizon effect 18

Tsume shogi 11

Mate using unpromoted pieces 6

Reason unclear 7

2004/11/13 GPW2004 21

Some new results

New program versions have been releasedTodai Shogi 6 and 7, Gekisashi 3 and AI Shogi 2004

Results of Todai 6 on the test setSolved 6 positions

The problem areas of these positions was different• Inaccurate evaluation function (2 positions)

• Insufficient hardware speed (2 positions)

• Horizon effect (1 position)

• Reason unclear (1 position)

2004/11/13 GPW2004 22

Differences between humans and computers

How difficult are the positions for human players?

Almost half of the positions (46) can be solved by more than 50% of the human respondentsThere are 14 positions that can not be solved by computers, but by more than 80% of the humans

Human percentage

Positions

0 – 10% 0

11 – 20% 12

21 – 30% 18

31 – 40% 10

41 – 50% 13

51 – 60% 16

61 – 70% 7

71 – 80% 9

81 – 90% 9

91 – 100% 5

2004/11/13 GPW2004 23

Conclusions and future workWe have proposed a set of 100 positions that is general and points to specific problem areas in computer shogiAs more positions get solved, we intend to replace them with new positionsFurther investigate of the unsolved positions for which the problem could not be determinedMaking further comparisons between what is difficult for humans and difficult for computers

2004/11/13 GPW2004 24

Finally

Download the test set here

gamelab.yz.yamagata-u.ac.jp/RESEARCH/shogitestset.zip

Let me know about your results

2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer...

Documents

Transcript of 2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer...