G. Tesauro, Temporal difference learning and...

G. Tesauro, Temporal difference learning andTD-Gammon

Joel HoffmanCS 541

October 19, 2006

Your mission

Goal:Learn to achieve reward through optimal sequence of actionsThe Enemy:Temporal credit assignment

The Plan:

Play a lot of backgammon

Reinforcement Learning

I Unsupervised agentI takes actions in environmentI FEEDBACK: consequences of actions alter the model

I applied backwards in time at a decreasing, tunable rate

Temporal Credit Assignment Problem

I multiple actions taken to achieve goalI which were responsible for success?I what is (partial) success?

Random Evaluation Function?!?!

I Error signal at each stepI ... from the network itselfI ... even on untrained networks

I Final unambiguous reward signal: Win or lossI Tilts the randomness a little toward accurate learning

I (in several thousand games)I Initially took thousands of random moves just to finish a

I Error signal at each stepI ... from the network itselfI ... even on untrained networksI Final unambiguous reward signal: Win or lossI Tilts the randomness a little toward accurate learning

TD-Gammon vs. Neurogammon

TD-Gammon’s modelAt first:

I Only inputs were board positionsI 40-80 hidden unitsI Equalled performance of Neurogammon after 200,000

self-played games

Then:I Added human-identified features as additional inputs

I Became invincible (nearly)

TD(λ) function

For each output unit Y :

wt+1 − wt = α(Yt+1 − Yt)t∑

λt−k∇wYk

t Model state at the end of the last stept + 1 Model state at the beginning of the next step

w Vector of neural network connection weightsα “learning rate” – exploration speed of the problem spaceλ Feedback rate ∈ (0, 1) – weighted error applied to past

choicesYt+1 − Yt Error signal at the current state

Yk History of Y ’s value from the first step (random) to last step∇w Gradient of network weights - Direction of steepest ascent

Advantages of unsupervised TD learning

That is, advantages in backgammon specifically

I Can train continuouslyI Not subject to human biasesI Has its own biases (explore too small a part of the state

space)I Occurred in checkers and GoI Dice roll helps eliminate this

I Dice roll also smooths out the evaluation functionI Easy concepts are linear wrt the variables

I (hidden variables don’t help)

G. Tesauro, Temporal difference learning and...

Documents

Transcript of G. Tesauro, Temporal difference learning and...

[Paul Magriel] Backgammon

Backgammon - Color Computer Archive

Tesauro de Arte & Arquitectura (TAA) - getty.edu · PDF fileSomeSomePUBLICATIONSPUBLICATIONS aboutabout TESAUROS TESAUROS Nagel: The Tesauro de Arte & Arquitectura and the Tesauro

Dev ops hackformers-matt-tesauro

GNU Backgammon Manual V0 · GNU Backgammon Manual V0.16 3 / 82 COLLABORATORS TITLE : REFERENCE : GNU Backgammon Manual V0.16 ACTION NAME DATE SIGNATURE WRITTEN BY Christian Anthon

British Backgammon Awards 2013

A Parallel Network that Learns to Play Backgammonscai/Courses/4ai10f/Papers/Tesauro... · 2015-11-24 · ARTIFICIAL INTELLIGENCE 357 A Parallel Network that Learns to Play Backgammon

Making security-agile matt-tesauro

–Backgammon, Scrabble - University of Albertajonathan/PREVIOUS/Courses/657/... · backgammon •The top backgammon programs are likely stronger than the human world champion Winning

A 'Neural' Network that Learns to Play Backgammon...794 A 'Neural' Network that Learns to Play Backgammon G. Tesauro Center for Complex Systems Research, University of Illinois at

Backgammon: The Lessons of Warcotswoldsinvitationalbackgammon.weebly.com/uploads/4/6/0/7/... · An Abbreviated Timeline Year Event 3,000 BC Backgammon Invented 1644 "Backgammon" appears

Tesauro de Cine

Course of Geothermics Dr. Magdala Tesauro

Backgammon, or “Tables” Games - SAE Expression …students.expression.edu/historyofgames/files/2011/09/backgammon... · History An early variation of Backgammon called "Nard"

Backgammon by Paul Magriel

Backgammon - Booklet.pdf

Texas Backgammon Championships

Backgammon: The Lessons of War€¦ · Essential Reading 1970 The Backgammon Book (Hardback) Jacoby & Crawford 1976 Backgammon Paul Magriel 1978 Vision Laughs at Counting Volumes

A Parallel Network that Learns to Play Backgammon - SNU 1989 - A... · ARTIFICIAL INTELLIGENCE 357 A Parallel Network that Learns to Play Backgammon G. Tesauro* Center for Complex

Tesauro Ling