Train Your Dog
-
Upload
breedguide -
Category
Lifestyle
-
view
216 -
download
0
description
Transcript of Train Your Dog
![Page 1: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/1.jpg)
Learning from how dogs learnLearning from how dogs learn
Prof. Bruce BlumbergProf. Bruce Blumberg
The Media Lab, MITThe Media Lab, MIT
[email protected]@media.mit.edu
www.media.mit.edu/~brucewww.media.mit.edu/~bruce
![Page 2: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/2.jpg)
About me…
![Page 3: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/3.jpg)
About me…
![Page 4: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/4.jpg)
Practical & compelling real-time learning
• Easy for interactive characters to learn what they ought to be able to learn
• Easy for a human trainer to guide learning process
• A compelling user experience
• Provide heuristics and practical design principles
![Page 5: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/5.jpg)
My bias & focus
• Learning occurs within an innate structure Learning occurs within an innate structure that biases…that biases…• Attention
• Motivation
• Innate frequency, form and organization of behavior
• When certain things are most easily learned
• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?
![Page 6: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/6.jpg)
sheep|dog:trial by eire
See sheep|dog video on my website
![Page 7: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/7.jpg)
Object persistence
See object persistence video on my website
![Page 8: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/8.jpg)
Temporal representation
See temporal representation (aka Goatzilla) video on my website
![Page 9: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/9.jpg)
Alpha Wolf
See alpha wolf video on my website
![Page 10: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/10.jpg)
Rover@home
See rover@home video on my website or go to Scientific American Frontiers website
![Page 11: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/11.jpg)
Dobie T. Coyote Goes to School
See Dobie video on my website
![Page 12: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/12.jpg)
Why look at Dog Training?
• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous
and far too big to search exhaustively
• To be compelling characters must
• Learn “obvious” contingencies between state, actions and consequences quickly
• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.
• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily
![Page 13: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/13.jpg)
Invaluable resources
• Doing it, and talking to people who do Doing it, and talking to people who do it.it.
• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez
• Lindsay, Burch & Bailey, MackintoshLindsay, Burch & Bailey, Mackintosh
• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger
![Page 14: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/14.jpg)
The problem facing dogs (real and synthetic)
Set of all possible actions
Set of all motivational
goals
Set of all possible stimuli
What do I do, when, in order to best satisfy my motivational goals?
![Page 15: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/15.jpg)
The space of possible stimuli is wicked big
Set of all possible stimuli
SmellsMotion
Sounds
Dog sounds
SpeechWhistles
Modality of Stimuli
Time of Occurence
State Space
![Page 16: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/16.jpg)
The space of possible actions is also very big
Set of all possible actions
Action
Time of Performance
Figure -8
Shake
Low shake
High -5
Beg
Down
Left ear twitch
Action Space
![Page 17: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/17.jpg)
Who gets credit for good things happening?
Yumm..
Action
Figure -8
Shake
Low shake
High -5
Beg
Down
Left ear twitch
Motion
Sounds
Dog sounds
SpeechWhistles
Modality of Stimuli
![Page 18: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/18.jpg)
Who gets credit for good things happening?
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time
![Page 19: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/19.jpg)
Conventional idea: back propagation from goal
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
![Page 20: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/20.jpg)
Conventional idea: back propagation from goal
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
![Page 21: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/21.jpg)
Conventional idea: back propagation from goal
stalk grab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
![Page 22: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/22.jpg)
The problem
• If each element in sequence has 3 variants, If each element in sequence has 3 variants, there are 729 possible combinations of there are 729 possible combinations of which 1 may work (ignoring stimuli)which 1 may work (ignoring stimuli)
• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.stimuli-action pairs to explore.
• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached
• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?
![Page 23: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/23.jpg)
Leyhausen’s suggestion…
stalk grab-bite
eye
orient
kill-bitechase
Time Each element is innately self-motivating and has innate reward metric
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
![Page 24: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/24.jpg)
Leyhausen’s suggestion…
stalk grab-bite
eye
orient
kill-bitechase
Time Each element is innately self-motivating and has innate reward metric
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
![Page 25: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/25.jpg)
Coppinger’s suggestion…
stalk grab-bite
eye
orient
kill-bitechase
Time Varying innate tendency to follow behavior with “next” in sequence
![Page 26: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/26.jpg)
Functional goal plays incidental role
stalk grab-bite
eye
orient
kill-bitechase
Time Propagated value from functional goal plays incidental role
Yumm..
![Page 27: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/27.jpg)
Big idea: innate biases make learning possible
• Biases include…Biases include…• Temporal Proximity implies causality
• Attend more readily to certain classes of stimuli than to others (motion vs. speech)
• Lazy discovery (pay attention once you have a reason to pay attention)
• Elements may be “innately” self-motivating and have local metric of “goodness”
![Page 28: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/28.jpg)
Good trainers actively guide dog’s exploration
• BehavioralBehavioral• Train behavior, then cue
• Differential rewards encourage variability
• MotorMotor• Shaping
• Rewarding successive approximations
• Luring
• Pose, e.g. “down”• Trajectory, e.g. “figure-8”
![Page 29: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/29.jpg)
Dogs constrain search for causal agents
Time
Consequences Window:Trainer “clicks” signaling reward is coming.
When reward is actually received
Attention Window:Cue given immediately before or as dog is moving into desired pose
Sit Approach Eat
Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows
![Page 30: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/30.jpg)
Dogs use implicit feedback to guide perceptual learning
Sit
Time
“sit-utterance” perceived.
Approach Eat
“click” perceived.
Dog decides to sit
Build & update perceptual model of “sit-utterance”
Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models
![Page 31: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/31.jpg)
Dogs give credit where credit is due…
• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose
• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously
• ImplicationImplication• Dog associates reward with resulting
body configuration or trajectory and not just with “follow-your nose”
![Page 32: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/32.jpg)
Observation: dogs give credit where credit is due
Sit
Time
“sit-utterance” perceived.
Approach Eat
“click” perceived.
Dog decides to sit
1. Credit sitting in presence of “sit-utterance”2. Build & update perceptual model of “sit-
utterance”
![Page 33: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/33.jpg)
D.L.: Take Advantage of Predictable Regularities
• Constrain search for causal agents by taking Constrain search for causal agents by taking advantage of temporal proximity & natural advantage of temporal proximity & natural hierarchy of state spaceshierarchy of state spaces• Use consequences to bias choice of action
• But vary performance and attend to differences
• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand
![Page 34: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/34.jpg)
D.L.: Make Use of All Feedback: Explicit & Implicit• Use rewarded action as context for Use rewarded action as context for
identifying identifying • Promising state space and action space to
explore
• Good examples from which to construct perceptual models, e.g.,
• A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.
![Page 35: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/35.jpg)
D.L.: Make Them Easy to Train
• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies
• Support Luring and ShapingSupport Luring and Shaping• Techniques to prompt infrequently expressed
or novel motor actions
• ““Trainer friendly” credit assignmentTrainer friendly” credit assignment• Assign credit to candidate that matches
trainer’s expectation
![Page 36: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/36.jpg)
The System
![Page 37: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/37.jpg)
Dobie T. Coyote…
See dobie video on my website
![Page 38: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/38.jpg)
Limitations and Future Work
• Important extensions Important extensions • Other kinds of learning (e.g., social or spatial)
• Generalization
• Sequences
• Expectation-based emotion system
• How will the system scale?How will the system scale?
![Page 39: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/39.jpg)
Useful Insights
• UseUse• Temporal proximity to limit search.
• Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration
• “trainer friendly” credit assignment
• Luring and shaping are essentialLuring and shaping are essential
![Page 40: Train Your Dog](https://reader033.fdocuments.in/reader033/viewer/2022052910/559cc1e51a28ab79788b45ad/html5/thumbnails/40.jpg)
Acknowledgements
• Members of the Synthetic Characters Members of the Synthetic Characters Group, past, present & futureGroup, past, present & future
• Gary WilkesGary Wilkes
• Funded by the Digital Life ConsortiumFunded by the Digital Life Consortium