Lesson12: Reinforcement Learning for Critterbot Science 8

Reinforcement Reinforcement LearningLearningScience 8 Unit B: Cells and Systems (Nature of Science Emphasis)

Introduction

> What does it mean to have a behaviour reinforced?

> Let’s look at a famous example first...

Introduction

Ivan Pavlov (1849-1936)> Born in Russia in 1849, Ivan Pavlov abandoned

a religious career for which he had been preparing, and instead went into science.

> His work had a great impact on the field of physiology (the study of the mechanical, physical, and biochemical functions of living organisms) by studying the mechanisms underlying the digestive system in mammals.

Source: Nobelprize.org

http://en.wikipedia.org/wiki/Physiology

http://nobelprize.org/educational_games/medicine/pavlov/readmore.html

Introduction

> Pavlov was awarded the Nobel Prize in Physiology or Medicine in 1904. He then turned to studying reflexes, in particular with dogs.

> His discoveries led to the science of behaviour.



Introduction

> Pavlov became interested in studying reflexes when he noticed that dogs sometimes drooled even without food being shown to them.

> Although no food was in sight, their saliva still dribbled. It turned out that the dogs were reacting to lab coats.



Introduction

> Every time the dogs were served food, the person who served the food was wearing a lab coat.

> The lab coats became a “stimulus”.



Introduction

> A stimulus is anything capable of evoking a response in an organism.

> Examples of stimuli include sights, sounds, heat, cold, smells, or other sensations.

> Therefore, the dogs reacted as if food was on its way whenever they saw a lab coat.



Introduction> In a series of experiments, Pavlov

then tried to figure out why this was happening.

> For example, he struck a bell when the dogs were fed. If the bell was sounded close to meal time, the dogs learnt to associate the sound of the bell with food.

> After a while, the stimulus of the bell, caused them to drool.



More on Pavlov's Dog

> You can read more about Pavlov’s dog and see if you can train a dog to drool on command online at the Nobel Prize website.


Reinforcement Learning

> Dogs are often trained through a method of reinforcement.

> For example, if a dog hears the word “sit” and receives a treat, he or she will learn that “sitting” provides a treat.

> In fact, almost all animals can learn through reinforcement.


Definition:– Reinforcement occurs when an event following a

response causes an increase in the probability of that response occurring in the future.

> So when a dog hears “sit” (response) and receives a treat (event), the dog will more likely sit in the future in hopes of receiving another treat.

http://en.wikipedia.org/wiki/Reinforcement


> If animals (including humans) can learn by reinforcement, can a machine also learn through reinforcement?

> Computing Scientists at the Centre for Machine Learning believe so, and they are building a robot that learns through reinforcement.


> The robot is called “Critterbot”.

> The robot responds to stimuli in the environment.

> For lessons on Critterbot see Critterbot for Physics 30 and Critterbot for Science 8.

http://www.uofaweb.ualberta.ca/cmaste/pdfs/AICML13CritterbotPhysics30.ppt#Introduction

http://www.uofaweb.ualberta.ca/cmaste/pdfs/AICML13CritterbotPhysics30.ppt#Introduction

http://www.uofaweb.ualberta.ca/cmaste/pdfs/AICML14CritterbotScience8.ppt

http://www.uofaweb.ualberta.ca/cmaste/pdfs/AICML14CritterbotScience8.ppt

How can a Machine be Reinforced?

> In Machine Learning (which is a type of artificial intelligence) the “learner” is a computer that learns by trying to obtain a maximum reward.

> So what does a computer or robot want as a reward?– Just a number.

-1 01

-1

0

1-1 0

10

1-1


> A positive reward will result in a “1”

> A neutral reward will result in a “0”

> A negative reward will result in a “-1”


> What separates Reinforcement Learning from other forms of artificial intelligence is that the learner is never told what actions to take.

> The learner uses a trial-and-error search approach and if it receives a positive reward, will continue that action.

> But if it receives a negative reward, it will learn to avoid that action.

Questions

1. How is a robot that uses Machine Learning different from robot that is programmed for specific tasks? – Answer: In Machine Learning, the robot is not told

what actions to take. It learns by trial and error.

Questions

1. A robot in a car factory is designed to build cars at a fast rate. Would Machine Learning be a good application for a car building machine? Why or why not?

Answer: No, probably not. Robots that build use specific designs to ensure they build exactly as they are told.

Questions

1. Are dogs the only animals that respond to a stimulus by salivating? For example, what happens to you when you are just about to put a pickle in your mouth? Or mustard? Or a sour candy?– Answer: Humans also respond to visual stimuli and

will salivate at the sight of some stimuli.

Questions

1. Critterbot was designed to respond to stimuli (plural for stimulus). Imagine that you had to design a robot to that will automatically shovel snow from your driveway every winter. – The robot cannot have any human assistance, it has

to be autonomous (work on its own). – First, come up with a ‘cool’ name for your robot.– Use drawings and written descriptions to write up a

one page explanation of how your robot would work.

continued...

Question 4 continued.

– What types of sensors would it need to have to work without your assistance? Remember, it is only going to shovel your driveway, and not wander down the street shovelling every driveway.

– Animals require energy and use special systems to convert food into energy. For example, the digestive system takes in food, digests it to extract energy and nutrients.

– How will your robot gets its energy? Remember, it has to work in winter conditions, most often when it is snowing.

Centre for Mathematics Science and Technology Education (CMASTE)382 Education SouthUniversity of AlbertaEdmonton AB T6G 2G5www.CMASTE.caTo download: select Outreach, Alberta Ingenuity Resources and Centre for Machine LearningFilename: AICML6BrainTumourAnalysis

Centre for Machine LearningDepartment of Computing ScienceUniversity of Alberta2-21 Athabasca HallEdmonton AB T6G 2E8(780) 492-4828www.machinelearningcentre.ca

Alberta Ingenuity2410 Manulife Place, 10180-101 StreetEdmonton AB T5J 3S4(780) 423-5735www.albertaingenuity.ca

Lesson12: Reinforcement Learning for Critterbot Science 8

Documents

Transcript of Lesson12: Reinforcement Learning for Critterbot Science 8