Neural Networks and their Application to Go · Neural Networks and their Application to Go A....

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Training neuralnetworks

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Monte CarloTree Search

Neural Networks and their Application to GoLearning Blackjack

Anne-Marie Bausch

ETH, D-MATH

May 31, 2016

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Table of Contents

1 Neural NetworksTheoryTraining neural networksProblems

2 AlphaGoThe Game of GoPolicy NetworkValue NetworkMonte Carlo Tree Search

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Perceptron

A perceptron is the most basic artificial neuron (developed inthe 1950s and 1960s).

The input X ∈ Rn, w1, . . . ,wn ∈ R are called weights and theoutput Y ∈ {0, 1}.The output depends on some treshold value τ :

output =

0, if W · X =

∑jwjxj ≤ τ,

1, if W · X =∑jwjxj > τ.

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Next, we introduce what is known as the perceptron’s bias B,

B := −τ.

This gives us a new formula for the output,

output =

{0, if W · X + B ≤ 0,

1, if W · X + B > 0.

Example

NAND gate:

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Sigmoid Neuron

Problem: Small change in input can change output a lot→ Solution: Sigmoid Neuron

Input X ∈ Rn

Output = σ(X ·W + B) = (1 + exp(−X ·W − B))−1 ∈[0, 1], where σ(z) := 1

1+exp(−z) is called the sigmoidfunction.

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Neural Networks

Given an input X , as well as some training and testing data, wewant to find a function fW ,B such that fW ,B : X → Y , whereY denotes the output.

How do we choose the weights and the bias?

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Example: XOR Gate

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Learning Algorithm

A learning algorithm chooses weights and biases withoutinterference of programmer.

Smoothness in σ:

∆output ≈∑j

δoutput

δwj∆wj +

δoutput

δB∆B

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

How to update weights and bias

How does the learning algorithm update the weights (and thebias)?

argminW ,B‖fW ,B(X )−Y ‖2

→ One method to do this is gradient descent→ Choose appropriate learning rate!

Example

Digit Recognition (1990s) → Youtube Video

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Example

One image consists of 28x28 pixels which explains why theinput layer has 784 neurons

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

3 main types of learning

Supervised Learning (SL)→ Learning some mapping from inputs to outputs.Example: Classifying Digits

Unsupervised Learning (UL)→ Given input and no output, what kinds of patterns canyou find? Example: Visual input is at first too complex,have to reduce number of dimensions

Reinforcement Learning (RL)→ Learning method interacts with its environment byproducing actions a1, a2, . . . that produce rewards orpunishments r1, r2, . . . .Example: Human learning

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Why was there a recent boost in the employmentof neural networks?

The evolution of neural networks stagnated because networkswith more than 2 hidden layers proved to be too difficult. Themain problems and their solutions are:

Huge amount of Data→ Big Data

Number of weights (capacity of computers)→ capacity of computers improved (Parallelism, GPUs)

Theoretical limits→ Difficult (⇒ See next slide)

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Theoretical Limits

Back-propagated error signals either shrink rapidly(exponentially in the number of layers) or grow out ofbounds

3 solutions:

(a) unsupervised pre-training ⇒ faciliates subsequentsupervised credit assignment throughback-propagation (1991).

(b) LSTM-like networks (since 1997) avoid problemthrough special architecture.

(c) Today, fast GPU-based computers allow forpropagating errors a few layers further down withinreasonable time

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

The Game of Go

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Main rules

Origin: Ancient China more than 2500 years ago

Goal: Gain the most points

White gets 6.5 points for moving second

Get points for territory at the end of game

Get points for prisoners→ Stone is captured if it has no more liberties (libertiesare “supply chains”)

Not allowed to commit suicide

Ko-Rule: Not allowed to play such that game is again asbefore

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

End of Game

The game is over when both players have passed consecutively→ Prisonners are removed and points are counted!

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

AlphaGo

DeepMind was founded in 2010 as a startup in Cambridge

Google bought DeepMind for $500M in 2014

AlphaGo beat European Go champion Fan Hui (2-dan) inOctober 2015

AlphaGo beat Lee Sedol (9-dan), one of the best players inthe world in March 2016 (4 out of 5 games)

Victory of AI in Go was thought to be 10 years into thefuture

1920 CPUs and 280 GPUs used during match against LeeSedol→ This equals around $1M without counting theelectricity used for training and playing

Next Game attacked by Google DeepMind: Starcraft

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

AlphaGo

Difficulty: Search space of future Go moves is larger than thenumber of particles in the known universe

Policy Network

Value Network

Monte Carlo Tree Search (MCTS)

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Policy Network Part 1

Multi-Layered Neural Network

Supervised-learning (SL)

Goal: Look at board position and choose next best move(does not care about winning, just about next move)

is trained on millions of example moves made by stronghuman players on KGS (Kiseido Go Server)

it matches strong human players about 57% of time(mismatches arenot necessarily mistakes)

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Policy Network Part 2

2 additional versions of policy networks: A stronger movepicker and a faster move picker

Stronger version uses RL→ trained more intensively by playing game to the end (istrained by millions of training games against previouseditions of itself, it does no reading, i.e., it does not try tosimulate any future moves)→ needed for creating enough training data for valuenetwork

Faster version is called “rollout network” → does not lookat entire board but at smaller window around previousmove → about 1000 times faster!

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

Multi-Layered Neural NetworkEstimates probability of each player winning the gameIs useful for speeding up reading: If particular position isbad, can skip any more moves along that line of playTrained on millions of example board positions which wererandomly picked between two copies of AlphaGo’s strongmove-picker

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

accomplishes reading and exploringFull-Power AlphaGo system then uses all of its “brains” inthe following way:→ Choose a few possible next moves using the basic movepicker (stronger version made AlphaGo weaker!)→ Evaluate each next move using value network and adeeper MC simulation (called “rollout”, uses fast movepicker) → Get 2 independent guesses → use parameter tocombine 2 guesses (optimal parameter is 0.5)

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

How the strength of AlphaGo varies

NeuralNetworks and

theirApplication to

A. Bausch

NeuralNetworks

Theory

Problems

AlphaGo

The Game of Go

Policy Network

Value Network

References

Mastering the game of Go with deep neural networks andtree search, Nature Volume 259, 2016

http://neuralnetworksanddeeplearning.com/chap1.html

https://www.dcine.com/2016/01/28/alphago/

Wikipedia Go(game)

Neural Networks and their Application to Go · Neural Networks and their Application to Go A....

Documents

Transcript of Neural Networks and their Application to Go · Neural Networks and their Application to Go A....

Design of Evaluation Functions using Neural Networks in the Game of Go

Mastering the game of Go with deep neural networks and ...cse634/G6present.pdf · Outline Introduction Model Experiment Summary Mastering the game of Go with deep neural networks

Lecture 10: Neural Networks and Deep Learningsaravanan-thirumuruganathan.github.io/cse5334... · Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks

Neural and Fuzzy Neural Networks

Neural Networks Neural Networks

Introduction to Neural Networks - Databricks · • Introduction to Neural Networks • Training Neural Networks • Applying your Neural Networks This series will be make use of

Mastering the Game of Go With Deep Neural Networks and ...nasghar/alphagoslides.pdf · Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar ... for a node

Mastering the game of go with deep neural networks and tree search

Indirect Encoding of Neural Networks for Scalable Goeplex.cs.ucf.edu/papers/gauci_ppsn10.pdf · Indirect Encoding of Neural Networks for Scalable Go In: ... single gene maps to a

An introduction to neural networks for beginners€¦ · Part 1 – Introduction to neural networks 1.1 WHAT ARE ARTIFICIAL NEURAL NETWORKS? Artificial neural networks (ANNs) are

Neural Networks · Neural Networks NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos Abstract This report is an introduction to Artificial Neural Networks. The various types

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search

Neural Networks

Higher Order Neural Networks and Neural Networks for ...

Mastering the game of Go with deep neural networks and ...storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf · Mastering the game of Go with deep neural networks

CS536: Machine Learning Artificial Neural Networks Neural Networks

ARTIFICIAL NEURAL NETWORKS. Introduction to Neural Networks.

Neural Network Part 1: Multiple Layer Neural Networkspages.cs.wisc.edu/.../slides/lecture10-neural-networks-1.pdfNeural networks • a.k.a. artificial neural networks, connectionist

Quantized Neural Networks: Training Neural Networks with ...jmlr.org/papers/volume18/16-456/16-456.pdfWe introduce a method to train Quantized-Neural-Networks (QNNs), neural networks

Mastering the Game of Go with Deep Neural Networks and ... · PDF fileMastering the Game of Go with Deep Neural Networks and Tree Search David Silver 1*, ... (js t) from its output