EE141 Motivated Learning based on Goal Creation Janusz Starzyk School of Electrical Engineering and...
-
Upload
silvester-mosley -
Category
Documents
-
view
217 -
download
0
Transcript of EE141 Motivated Learning based on Goal Creation Janusz Starzyk School of Electrical Engineering and...
EE141
Motivated Learning based on Motivated Learning based on Goal CreationGoal CreationJanusz StarzykSchool of Electrical Engineering and Computer Science, Ohio University, USA
www.ent.ohiou.edu/~starzyk
Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 4 December 2009.
EE141
Embodied Intelligence (EI) Embodiment of Mind How to Motivate a Machine Goal Creation Hierarchy GCS Experiment Motivated Learning
OutlineOutline
EE141
Design principles of intelligent systemsDesign principles of intelligent systemsfrom Rolf Pfeifer “Understanding of Intelligence”, 1999
Interaction with complex environment
cheap design ecological balance redundancy principle parallel, loosely
coupled processes asynchronous sensory-motor
coordination value principle Agent
Drawing by Ciarán O’Leary- Dublin Institute of Technology
EE141
Embodied Intelligence Embodied Intelligence
– Mechanism: biological, mechanical or virtual agent
with embodied sensors and actuators– EI acts on environment and perceives its actions– Environment hostility is persistent and stimulates EI to act– Hostility: direct aggression, pain, scarce resources, etc– EI learns so it must have associative self-organizing memory– Knowledge is acquired by EI
Definition
Embodied Intelligence (EI) is a mechanism that learns how to survive in a hostile environment
EE141
Embodiment
Actuators
Sensors
Intelligence core
channel
channel
Embodiment
Sensors
Intelligence core
Environment
channel
channelActuators
Embodiment
Actuators
Sensors
Intelligence core
channel
channel
Embodiment
Sensors
Intelligence core
Environment
channel
channelActuators
Embodiment of a MindEmbodiment of a Mind Embodiment is a part of environment under control
of the mind It contains intelligence core and sensory motor
interfaces to interact with environment It is necessary for development of intelligence It is not necessarily constant
EE141
Changes in embodiment modify brain’s self-determination
Brain learns its own body’s dynamics
Self-awareness is a result of identification with own embodiment
Embodiment can be extended by using tools and machines
Successful operation is a function of correct perception of environment and own embodiment
Embodiment of MindEmbodiment of Mind
EE141
How to Motivate a Machine ?How to Motivate a Machine ?
A fundamental question is what motivates an agent to do anything, and in particular, to enhance its own complexity?
What drives an agent to explore the environment and learn ways to effectively interact with it?
EE141
How to Motivate a Machine ?How to Motivate a Machine ? Pfeifer claims that an agent’s motivation should emerge
from the developmental process. He called this the “motivated complexity” principle. Chicken and egg problem? An agent must have a motivation to
develop while his motivation comes from development?
Steels suggested equipping an agent with self-motivation. “Flow” experienced when people perform their expert activity well
would motivate to accomplish even more complex tasks. But what is the mechanism of “flow”?
Oudeyer proposed an intrinsic motivation system. Motivation comes from a desire to minimize the prediction error. Similar to “artificial curiosity” presented by Schmidhuber.
EE141
How to Motivate a Machine ?How to Motivate a Machine ?
Exploration is needed in order to learn and to model the
environment. But is exploration the only motivation we need to develop EI? Can we find a more efficient mechanism for learning?
I suggest a simpler mechanism to motivate a machine.
Although artificial curiosity helps to explore the environment, it leads to learning without a specific purpose. It may be compared to exploration in
reinforcement learning.
EE141
How to Motivate a Machine ?How to Motivate a Machine ? I suggest that it is the hostility of the environment, in the
definition of EI that is the most effective motivational factor. It is the pain we receive that moves us. It is our intelligence determined to reduce this pain that motivates us
to act, learn, and develop.
Both are needed - hostility of the environment and
intelligence that learns how to reduce the pain. Thus pain is good. Without pain we would not be motivated to develop.
Fig. englishteachermexico.wordpress.com/
EE141
Motivated Learning Motivated Learning I suggest a goal-driven mechanism to motivate
a machine to act, learn, and develop. A simple pain based goal creation system. It uses externally defined pain signals that are
associated with primitive pains. Machine is rewarded for minimizing the primitive
pain signals.
Definition: Motivated learning (ML) is learning based on the self-organizing system of goal creation in embodied agent. Machine creates abstract goals based on the primitive pain signals. It receives internal rewards for satisfying its goals (both primitive and
abstract). ML applies to EI working in a hostile environment.
EE141
Pain-center and Goal CreationPain-center and Goal Creation
+
-
Sensor
Motor
Pain detection
Dualpain
memory
Pain increase
Paindecrease
(-)
(+)
Stimulation
(-)
(+)
activation
need
Pain detection/goal creation centerReinforcement neuro-transmitterSensory neuronMotor neuron
Pain detection/goal creation centerReinforcement neuro-transmitterSensory neuronMotor neuron
Missing objects
inhibition
expe
ctat
ion
Simple Mechanism Creates hierarchy of values Leads to formulation of complex goals Reinforcement
• Pain increase• Pain decrease
Forces exploration
EE141
Abstract Goal Creation for MLAbstract Goal Creation for ML
The goal is to reduce the primitive pain level Abstract goals are created if they satisfy the primitive goals
Expectation
AssociationInhibitionReinforcementConnectionPlanning
- +
PainDual pain
Food
refrigerator
- +
Stomach
Abstract pain(Delayed memory of pain)
“food” becomes a sensory input to
abstract pain center
Sensory pathway(perception, sense)
Motor pathway(action, reaction)
Primitive Level
Level I
Level II
Eat
Open
EE141
Goal Creation ExperimentGoal Creation Experiment
Sensory-motor pairs and their effect on the environment
PAIR #SENSORY MOTOR INCREASES DECREASES
1 Food Eat sugar level food supplies
8 Grocery Buy food supplies money at hand
15 Bank Withdraw money at hand spending limits
22 Office Work spending limits
job opportunities
29 School Study job opportunities
-
EE141
Goal Creation Experiment in MLGoal Creation Experiment in ML
Pain signals in GCS simulation
0 100 200 300 400 500 6000
1
Primitive Hunger
Pa
in
0 100 200 300 400 500 6000
0.5
Lack of Food
Pa
in
0 100 200 300 400 500 6000
0.5
Empty Gorcery
Pa
in
Discrete time
EE141
Goal Creation Experiment in MLGoal Creation Experiment in ML
Action scatters in 5 GCS simulations
0 100 200 300 400 500 6000
5
10
15
20
25
30
35
40Goal Scatter Plot
Go
al I
D
Discrete time
EE141
Goal Creation Experiment in MLGoal Creation Experiment in ML
The average pain signals in 100 GCS simulations
0 100 200 300 400 500 6000
0.5
Primitive Hunger
Pai
n
0 100 200 300 400 500 6000
0.10.2
Lack of FoodP
ain
0 100 200 300 400 500 6000
0.10.2
Empty Gorcery
Pai
n
0 100 200 300 400 500 6000
0.10.2
Lack of Money
Pai
n
0 100 200 300 400 500 6000
0.050.1
Lack of JobOpportunitites
Pai
n
Discrete time
EE141
Compare RL (TDF) and ML (GCS)Compare RL (TDF) and ML (GCS)
Mean primitive pain Pp value as a function of the number of iterations:
- green line for TDF - blue line for GCS.
Primitive pain ratio with pain threshold 0.1
EE141
Comparison of execution time on log-log scale TD-Falcon green GCS blue
Combined efficiency of GCS 1000 better than TDF
Compare RL (TDF) and ML (GCS)Compare RL (TDF) and ML (GCS)
Problem solved
Conclusion: embodied intelligence, with motivated learning based on goal creation is an effective learning and decision making system for dynamic environments.
EE141
Reinforcement LearningReinforcement Learning Motivated Learning Motivated Learning Single value function Measurable rewards
Can be optimized
Predictable Objectives set by
designer Maximizes the reward
Potentially unstable
Learning effort increases with complexity
Always active
Multiple value functions One for each goal
Internal rewards Cannot be optimized
Unpredictable Sets its own objectives Solves minimax problem
Always stable
Learns better in complex environment than RL
Acts when needed
EE141
Sounds like science fictionSounds like science fiction
If you’re trying to look far ahead, and what you see seems like science fiction, it might be wrong.
But if it doesn’t seem like science fiction, it’s definitely wrong.
From presentation by Feresight Institute
EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
Resources – Evolution of ElectronicsResources – Evolution of Electronics
EE141From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
Clock Speed (doubles every 2.7 years)
EE141
Doubling (or Halving) timesDoubling (or Halving) times
Dynamic RAM Memory “Half Pitch” Feature Size 5.4
years Dynamic RAM Memory (bits per dollar) 1.5
years Average Transistor Price 1.6 years Microprocessor Cost per Transistor Cycle 1.1
years Total Bits Shipped 1.1
years Processor Performance in MIPS 1.8
years Transistors in Intel Microprocessors 2.0 years Microprocessor Clock Speed 2.7
yearsFrom Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
EE141
Software or hardware?Software or hardware?
Sequential Error prone Require programming Low cost Well developed
programming methods
Concurrent Robust Require design Significant cost Hardware prototypes
hard to build
Software Hardware
EE141
2005 2010 2015 2020 2025 2030 2035 204010
4
105
106
107
108
109
1010
1011
Year
Num
ber
of n
euro
ns
Software Simulation (PC based) Hardware approach (FPGA)
Analog VLSI
Future software/hardware capabilitiesFuture software/hardware capabilities
Human brain complexity
EE141
0%
20%
40%
60%
80%
100%
1999
2002
2005
2008
2011
2014
% Area Memory
% Area ReusedLogic
% Area New Logic
Percent of die area that must be occupied by memory to maintain SOC design productivity
Design Productivity Gap Design Productivity Gap Low-Value Designs? Low-Value Designs?
Source = Japanese system-LSI industry
EE141
Self-Organizing Learning Arrays SOLARSelf-Organizing Learning Arrays SOLAR
Integrated circuits connect transistors into a system-millions of transistors easily assembled-first 50 years of microelectronic revolution
Self-organizing arrays connect processors into a system-millions of processors easily assembled-next 50 years of microelectronic revolution
* Self-organization * Sparse and local
interconnections * Dynamically
reconfigurable * Online data-driven
learning