Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

41
Lecture about Agents that Learn 3rd April 2000 INT4/2I1235

Transcript of Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Page 1: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Lecture about Agents that Learn

• 3rd April 2000

• INT4/2I1235

Page 2: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Agenda

• Introduction• Centralized learning vs decentralized learning• Credit Assignment Problem• Learning and Activity Coordination• Learning about and from other agents• Learning and Communication• Summary

Page 3: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Introduction

• Todays topic• Who is the lecturer• Why do we have this lecture

Page 4: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Todays topic

• How do agents learn?• What are the benefits of learning agents?• Learning in isolation, or in cooperation?

Page 5: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Who is the lecturer

• Johan Kummeneje• Doctoral Student• RoboCup, Social Decisions, and Java

Page 6: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Why do we have this lecture

• Beats me….. You tell me.

• Take 2 minutes to think about why this is interesting, and then I will ask 2 or 3 of you what you think.

Page 7: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Page 8: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Centralized vs Decentralized

• Introduction• The Degree of Decentralization• Interaction-specific features• Involvement-specific features• Goal-specific features• The learning method• The learning feedback

Page 9: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Introduction

• Learning process => planning, inference, decision steps etc.

• Centralized learning or isolated learning• Decentralized learning or interactive

learning

Page 10: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

The Degree of Decentralization

• Distributedness• Paralellism

Page 11: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Interaction-specific features

• Level of interaction ( ”simple” observation to complex negotiations and dialogues)

• Persitence of interaction (short-long)• Frequency (low -high)• Pattern ( unstructured- hierarchical)• Variability (fixed - dynamic)

Page 12: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Involvement-specific features

• Relevance to the learning process• Role in the learning process• Generalist-- Specialist

Page 13: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Goal-specific features

• Improvement (Individual vs Social)• Conflict vs Compatible Goals

Page 14: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

The learning method

• Rote learning (”Korvstoppning”)

• Instructed and adviced

• Examples and practice (Learning by Doing, Baden-Powell)

• Analogy

• Discovery

Efforts increase from top to bottom.

Page 15: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

The learning feedback

• Supervised (tells which action that is the best)

• Reinforcement (maximizing the utility of action)

• Unsupervised (no explicit feedback)

Page 16: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Page 17: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Credit Assignment Problem

• Inter Agent CAP (how to divide credit to the different agents)

• Intra Agent CAP (how to divide credit between different actions performed in an agent)

Page 18: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Page 19: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Learning and Activity Coordination

• Introduction

• Reinforcement Learning– Q-Learning and Learning Classifier Systems

• Isolated, Concurrent Reinforcement Learners

• Interactive Reinforcement Learning of Coordination– ACE and AGE

Page 20: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Introduction

• Activity Coordination• Adaption to to differences in the

coordination process• Effectively utilize opportunities and

avoidance of pitfalls.

Page 21: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Reinforcement Learning

• Optimise the feedback (reinforcement)• Modeled by a Markov decision process• <S, A, SxSxA,r>

Page 22: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Q-Learning

• When getting feedback=> update the Q-value

• Q(s,a) <- (1-b)Q(s,a)+b(R+y max(Q(s',a'))

• where b is a small constant called the learning rate

Page 23: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Learning Classifier Systems

• A classifier is (condition, action)• Strength of the classifier at a time S(c,a)• At each timestep a classifier is choosen from

a matchset ( according to environment)• Feedback is received and the S is modified

accordingly.

Page 24: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Isolated, Concurrent Reinforcement Learners

• Agent Coupling• Agent relationships• Feedback timing• Optimal behaviour combinations

• CIRL• No modelling of other agents• In cooperative situations, complimentary

policies can be developed• Adapts to similar situations.

Page 25: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Interactive Reinforcement Learning of Coordination

• Eliminates incompatible actions• Agents can observe the set of considered

actions of other agents.• Two different alternatives are ACE and AGE

Page 26: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Action Estimate Algorithm (ACE)

• Each agent calculates the set of performable actions• For each of these the agent calculates the goalrelevance.• For all agent with a GR above a treshold, the agents calc.

And announces a bid with a risk factor and a noise term :• B(S)= (a+b)E(S)• Removal of incompatible actions. It thereafter executes the

one with the highest bid.• The feedback increases the probability for succesful actions

to be performed in future.

Page 27: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Action Group Estimate Algorithm (AGE)

• All applicable actions from each agent is collected in to all possible activity contexts, in which all actions are mutually compatible.

• Using the same bidding strategy from ACE, the highest sum of bids for a activity context, chooses the activity context to execute.

• Credit assignment is dependent on the actions performed and the relevance of the action.

• Requires more computational effort than ACE.

Page 28: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Page 29: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Learning about and from other agents

• Introduction • Learning Organizational Roles• Learning in Market Environments

Page 30: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Introduction

• Learning to improve the individual performance

• On the expense of other agents

• Anticipatory Agents, RMM

Page 31: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

·Learning Organizational Roles

• Learns roles, to better complement each other.

• Each agent can be in a set of roles (one at a time), and the choice is to choose the most appropriate role. (Minimise costs).

• f(U, P, C, Potential)

Page 32: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

·Learning in Market Environments

• Agents sell/buy information from each other.

• 0-level agents do not model other agents• 1-level agents model other agents as 0-level

agents• 2-level agents model other agents as 1-level

agents

Page 33: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Page 34: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Learning and Communication

• Introduction• Reducing Communication by Learning• Improving Learning by Communication

Page 35: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Introduction

• Learning to communicate• Communicating as learning

• What to communicate?• When to communicate?• With whom to communicate?• How to communicate?

Page 36: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Reducing Communication by Learning

• Learning about the abilities of other agents.

• Learning which agents to ask, instead of broadcasting

• Problem similarities

Page 37: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Improving Learning by Communication

• Communicating beliefs and pieces of information

• Explanation

• Ontologies• Finding out complex relationships between

different agents and actions.

Page 38: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Page 39: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Summary

• We have seen the move of foci from isolated (individual, centralized) learning to a more diverse flora of learning.

• Besides standard (old) ML-methods there are some new ML-algorithms proposed.

• Agents learn to improve communication and cooperation.

Page 40: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Further reading

• Peter Stone, Ph.D-thesis• Weiss (coursematerial), chapter 6• Russell and Norvig, AI. A modern Approach

Page 41: Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

• THE END