Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Post on 03-Jan-2016

213 views 0 download

Transcript of Lecture about Agents that Learn 3rd April 2000 INT4/2I1235.

Lecture about Agents that Learn

• 3rd April 2000

• INT4/2I1235

Agenda

• Introduction• Centralized learning vs decentralized learning• Credit Assignment Problem• Learning and Activity Coordination• Learning about and from other agents• Learning and Communication• Summary

Introduction

• Todays topic• Who is the lecturer• Why do we have this lecture

Todays topic

• How do agents learn?• What are the benefits of learning agents?• Learning in isolation, or in cooperation?

Who is the lecturer

• Johan Kummeneje• Doctoral Student• RoboCup, Social Decisions, and Java

Why do we have this lecture

• Beats me….. You tell me.

• Take 2 minutes to think about why this is interesting, and then I will ask 2 or 3 of you what you think.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Centralized vs Decentralized

• Introduction• The Degree of Decentralization• Interaction-specific features• Involvement-specific features• Goal-specific features• The learning method• The learning feedback

Introduction

• Learning process => planning, inference, decision steps etc.

• Centralized learning or isolated learning• Decentralized learning or interactive

learning

The Degree of Decentralization

• Distributedness• Paralellism

Interaction-specific features

• Level of interaction ( ”simple” observation to complex negotiations and dialogues)

• Persitence of interaction (short-long)• Frequency (low -high)• Pattern ( unstructured- hierarchical)• Variability (fixed - dynamic)

Involvement-specific features

• Relevance to the learning process• Role in the learning process• Generalist-- Specialist

Goal-specific features

• Improvement (Individual vs Social)• Conflict vs Compatible Goals

The learning method

• Rote learning (”Korvstoppning”)

• Instructed and adviced

• Examples and practice (Learning by Doing, Baden-Powell)

• Analogy

• Discovery

Efforts increase from top to bottom.

The learning feedback

• Supervised (tells which action that is the best)

• Reinforcement (maximizing the utility of action)

• Unsupervised (no explicit feedback)

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Credit Assignment Problem

• Inter Agent CAP (how to divide credit to the different agents)

• Intra Agent CAP (how to divide credit between different actions performed in an agent)

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Learning and Activity Coordination

• Introduction

• Reinforcement Learning– Q-Learning and Learning Classifier Systems

• Isolated, Concurrent Reinforcement Learners

• Interactive Reinforcement Learning of Coordination– ACE and AGE

Introduction

• Activity Coordination• Adaption to to differences in the

coordination process• Effectively utilize opportunities and

avoidance of pitfalls.

Reinforcement Learning

• Optimise the feedback (reinforcement)• Modeled by a Markov decision process• <S, A, SxSxA,r>

Q-Learning

• When getting feedback=> update the Q-value

• Q(s,a) <- (1-b)Q(s,a)+b(R+y max(Q(s',a'))

• where b is a small constant called the learning rate

Learning Classifier Systems

• A classifier is (condition, action)• Strength of the classifier at a time S(c,a)• At each timestep a classifier is choosen from

a matchset ( according to environment)• Feedback is received and the S is modified

accordingly.

Isolated, Concurrent Reinforcement Learners

• Agent Coupling• Agent relationships• Feedback timing• Optimal behaviour combinations

• CIRL• No modelling of other agents• In cooperative situations, complimentary

policies can be developed• Adapts to similar situations.

Interactive Reinforcement Learning of Coordination

• Eliminates incompatible actions• Agents can observe the set of considered

actions of other agents.• Two different alternatives are ACE and AGE

Action Estimate Algorithm (ACE)

• Each agent calculates the set of performable actions• For each of these the agent calculates the goalrelevance.• For all agent with a GR above a treshold, the agents calc.

And announces a bid with a risk factor and a noise term :• B(S)= (a+b)E(S)• Removal of incompatible actions. It thereafter executes the

one with the highest bid.• The feedback increases the probability for succesful actions

to be performed in future.

Action Group Estimate Algorithm (AGE)

• All applicable actions from each agent is collected in to all possible activity contexts, in which all actions are mutually compatible.

• Using the same bidding strategy from ACE, the highest sum of bids for a activity context, chooses the activity context to execute.

• Credit assignment is dependent on the actions performed and the relevance of the action.

• Requires more computational effort than ACE.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Learning about and from other agents

• Introduction • Learning Organizational Roles• Learning in Market Environments

Introduction

• Learning to improve the individual performance

• On the expense of other agents

• Anticipatory Agents, RMM

·Learning Organizational Roles

• Learns roles, to better complement each other.

• Each agent can be in a set of roles (one at a time), and the choice is to choose the most appropriate role. (Minimise costs).

• f(U, P, C, Potential)

·Learning in Market Environments

• Agents sell/buy information from each other.

• 0-level agents do not model other agents• 1-level agents model other agents as 0-level

agents• 2-level agents model other agents as 1-level

agents

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Learning and Communication

• Introduction• Reducing Communication by Learning• Improving Learning by Communication

Introduction

• Learning to communicate• Communicating as learning

• What to communicate?• When to communicate?• With whom to communicate?• How to communicate?

Reducing Communication by Learning

• Learning about the abilities of other agents.

• Learning which agents to ask, instead of broadcasting

• Problem similarities

Improving Learning by Communication

• Communicating beliefs and pieces of information

• Explanation

• Ontologies• Finding out complex relationships between

different agents and actions.

Agenda

• Introduction

• Centralized learning vs decentralized learning

• Credit Assignment Problem

• Learning and Activity Coordination

• Learning about and from other agents

• Learning and Communication

• Summary

Summary

• We have seen the move of foci from isolated (individual, centralized) learning to a more diverse flora of learning.

• Besides standard (old) ML-methods there are some new ML-algorithms proposed.

• Agents learn to improve communication and cooperation.

Further reading

• Peter Stone, Ph.D-thesis• Weiss (coursematerial), chapter 6• Russell and Norvig, AI. A modern Approach

• THE END