Intelligence Agents (Chapter 2) 1. Autonomy is a spectrum Fully realized human beings – highly...

52
Intelligence Agents (Chapter 2) 1
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    0

Transcript of Intelligence Agents (Chapter 2) 1. Autonomy is a spectrum Fully realized human beings – highly...

Intelligence Agents

(Chapter 2)

1

Autonomy is a spectrum

Fully realized human beings – highly autonomous

Object– non autonomous

midpoint – partly controlled by human• delegate goals at a high level• adjustable autonomy depending on

• agent believes human is better• high degree of uncertainty about environment• decision possibly harmful –second opinion• agent lacks capability of making decision

2

3

An Agent in its Environment

AGENT

ENVIRONMENT

Sensor Input actionoutput

4

Agent Environmentsaccessible (get complete state info) vs inaccessible

environment (most real world environments)

Deterministic vs non-deterministic: deterministic means there is a single guaranteed affect. There is no question about the state that will result from the action.

static (changes only by agent) vs dynamic

physical world is highly dynamic

Discrete vs continuous: discrete means there are fixed, finite number of actions.

5

Agent Environmentsaccessible (get complete state info) vs inaccessible

environment (most real world environments)

• An accessible environment is one in which the agent can obtain complete, accurate, up-to-date information about the environment’s state.

• Most moderately complex environments (including, for example, the everyday physical world and the Internet) are inaccessible.

• The more accessible an environment is, the simpler it is to build agents to operate in it.

6

Agent EnvironmentsDeterministic vs non-deterministic: deterministic

means there is a single guaranteed affect. There is no question about the state that will result from the action.

• The physical world can to all intents and purposes be regarded as non-deterministic.

• Non-deterministic environments present greater problems for the agent designer.

7

Visit with your neighbor-

Give examples of environments which fall in each of the categories (static, deterministic, accessible, discrete).

8

Dynamic Environments

Must gather information to determine the state of the environment – as it has changed since the last time you looked at it.

Other processes can interfere with actions it attempts to perform – so information gathering must continue after action has been selected.

9

The most complex systems

inaccessible

non-deterministic

dynamic

continuous

Named open (Hewitt, 1986)

10

Intelligent Agents

• Reactivity• proactive• social ability

Types of agent systems

Control – takes action based on inputs (thermometer)

Software demons – email utility may constantly monitor email and take action.

Functional – takes input, produces an output

Reactive – maintains a relationship with the system. Not a specific function to be performed.

11

12

Purely Reactive agents are simple processing units that perceive and react to changes in their environment. Such agents do not have a symbolic representation of the world and do not use complex symbolic reasoning.

• The advocates of reactive agent systems claims that intelligence is not a property of the active entity but it is distributed in the system - emerges.

• Example – ants/bees in nature

13

• Intelligence is seen as an emergent property of the entire activity of the system, the model trying to mimic the behaviour of large communities of inferior living beings, such as the communities of insects.

• Building purely goal-directed systems is not hard – neither is building purely reactive systems. It is balancing both that is hard. Not surprising – as it is comparatively rare to find humans than do this very well.

• Example – AlphaWolves. Agent has a personality. Being directed with other goals from human participant. How howl at the head wolf if you are shy?

14

A reactive system is one that maintains an on-going interaction with its environment and responds to changes (in time or the response to be useful)

Must make local decisions which have global consequences

Consider printer control. May unfairly deny service over long range, even though seems appropriate in short term. Likely in episodic (non-history sensitive) environments.

15

A little intelligence goes a long way.

Oren Etzioni (speaking about the commercial experience of NETBOT, Inc): We made our agents dumber and dumber and dumber until finally they made money!

NetBot’s Jango represents one of the most visible use of agents on the Internet, in this case an application that helps users do comparison shopping on the Web.

Explain why you think this it was necessary to make the agents dumber and dumber.

16

• intentional systems, namely systems “whose behaviour can be predicted by the method of attributing belief, desires and rational acumen” (Dennett, 1987).

• Dennett identifies different “grades” of intentional systems: A first order intentional system has beliefs and desires but no beliefs and desires about beliefs and desires. A second order system does.

17

first order: I desire an A in the classsecond order: I desire that you should desire an A in

the class

first order: I believe you are honestsecond order: I desire to believe you are honestsecond order: I believe you believe you are honest

Shoham: such a mentalistic or intentional view of agents is not just another invention of computer scientists but is a useful paradigm for describing complex distributed systems.

BDI architecture

18

Abstract Architectures for Agents

Assume the environment may be in any of a finite set E of discrete, instantaneous states:E = {e0,e1,e2,…}Agents have a repertoire of possible actions which transform the state of the environmentAc = {α0, α1, α2, α3 …}A run, r, is a sequence of interleaved environment states and actions

neeeer 1310 ...: 210

19

Let

– R be the set of all such finite sequences– RAc be the subset of these that end in action– RE be the subset of these that end in

environment state– R = RAc + RE

20

State Transformer FunctionsRecall that the power set P({1,2,3}) is the set of all subsets = {{},{1},{2},{3},{1,2},{1,3}, {2,3}, {1,2,3}}The state transformer function (tau) represents the behavior of the environment:

Note that the result of applying is non-deterministic (and hence goes to the power set of environments)Note that environments are

– history dependent (dependent on whole run)– non-deterministic (goes to set of possible states)

)(: ER Ac P

21

If (r) = , then there are no possible successor states to r. The system has ended the run.

Formally, an environment Env is a triple Env = <E,e0,> where E is a set of environment states, e0 E is the initial state, and is the transformer function.

22

AgentsAn agent maps runs (ending in an environment) into actions

Ag: RE Ac

Ag is the set of all agents which perform actions based on entire history of the system. Since the environment is non-deterministic, even if the agent is deterministic, we aren’t sure what the results of our actions will be.

23

For Airline Reservations:Is this history dependent?

e0: current credit card balance and current reservations

Ac: set of possible actions

: What changes after actions?

If deterministic, how many next states are possible?

24

Purely Reactive Agents

Some agents decide what to do without reference to their history

action: EAc (not dependent on run)

A thermostat is purely reactive

action(e) = off (if e is within limits)

=on otherwise

25

Perception

see function is the ability to observe the environment

action function represents the agent’s decision making process

The output of see is a percept see: EPer

action: Per*A (maps sequence of percepts into actions)

26

Perception

Now introduce perception system:

Environment

Agent

see action

27

Another graphic of same concept

Mars explorer – an example of a reactive system

Mars explorer (L. Steels) objective

to explore a distant planet, and in particular, to collect sample of a precious rock the location of the samples is not known in advance, but

it is known that they tend to be clustered Obstacles impede motion in some cells

mother ship broadcasts radio signal weakens with distance

no map available Collaborative Stigmergic - indirect coordination

28

Mother ship

autonomous vehicle

precious rock

29

At seatsExplain how the Mars Explorer problem could be solved

reactively?

Explain how the problem could be solved if you build a model and communicate with other agents directly/

30

Mars explorer (cont.)

single explorer reactive solution:

– behaviours / rules1. if obstacle then change direction

2. if carrying samples and at basethen drop them

3. if carrying samples and not at basethen travel toward ship

4. if detect sample then pick it up

5. if true then walk randomly

total order relation in preference of actions– 1 < 2 < 3 < 4 < 5

– What are issues in finding more of clustered sample?

31

Mars explorer (cont.)multiple explorer solution ?

– Can we just replicate the single agent solution?

– if one agent found a cluster of rocks – communicate ?– range ?– position ?– how to deal with such messages ? may be far off …

– indirect communication - stigmergic• each agent carries “radioactive crumbs”, which can be dropped,

picked up and detected by passing robots

• communication via environment is called stigmergy (a form of self organization)

32

example – Mars explorer (cont.)solution inspired by ant foraging behaviour

• agent creates a “trail” of radioactive crumbs back to the mother ship whenever it finds a rock sample

• if another agent comes across a trail, it can follow it to the sample cluster

refinement:• agents following trail to the samples picks up some

crumbs to make the trail fainter• the trail leading to the empty cluster will finally be

removed

33

Example – Mars explorer (cont.)modified rule set

1. if detect an obstacle then change direction2. if carrying samples and at the base

then drop samples3. if carrying samples and not at the base

then drop 2 crumbs and travel toward ship4. if detect a sample then pick up sample5. if sense crumbs then pick up 1 crumb and travel away from ship6. if true then move randomly (nothing better to do)

order relation: 1 < 2 < 3 < 4 < 5 < 6

achieves near optimal performance in many situationscheap solution and robust (the loss of a single agent is not

critical).L. Steels argues that (deliberative) agents are “entirely

unrealistic” for this problem.

34

At seats

List advantages and disadvantages of a reactive approach to Mars explorer?

35

Mars explorer (cont.)

advantages– simple (no symbolic model of the world)– economic (small, special purpose agent )– computationally tractable (no centralized control)– robust against failure (no bottleneck)

disadvantages– agents act short-term since they use only local information– no learning– how to engineer such agents ? Difficult if more than 10 rules interact– no formal tools to analyse and predict

36

History Sensitive

A system is history sensitive if it not only depends on the current state but HOW you got there.

For example, suppose I am trying to divide you in pairs for a class discussion. I look at your physical placement in the class to decide what pairs would work well together.

Suppose that a person is sitting remotely BECAUSE they have hostile feelings towards someone and that if I look at the environment over time, I can see the person moving further and further away. If that history behind their placement (that I can’t see from the environment ) would be helpful in making the pairing decision – we call it history sensitive. 37

38

Agents with State (NOT reactive)We now consider agents that maintain state:

Environment

Agent

see action

next state

39

Agents with StateThese agents have some internal data structure, which is typically

used to record information about the environment current state and past history. More than what they can see from current environment. May attempt to model the world.

**Describe mars explorer with state.Let I be the set of all internal states of the agent.

The perception function see for a state-based agent is unchanged:see : E Per

The action-selection function action is now defined as a mappingaction : I Ac

from internal states to actions. An additional function next is introduced, which maps an internal state and percept to an internal state:

next : I Per I

40

Agent Control Loop1. Agent starts in some initial internal state i0

time t=0

2. Loop

a. Observes its environment state et, and generates a percept see(et)

b. Internal state of the agent is then updated via next function, becoming it+1= next(it, see(et))

c. The action selected by the agent is action(next(it, see(et)))

d. t++

41

Tasks for Agents

We build agents in order to carry out tasks for us

The task must be specified by us…

But we want to tell agents what to do without telling them how to do it

42

Utility Functions over StatesOne possibility: associate utilities with individual states — the

task of the agent is then to bring about states that maximize utility

A task specification is a functionu : E Reals

which associates a real number with every environment stateHow do we specify the task to be carried out? By telling the

system what states we like.

Utilities

Normally utilities show a degree of happiness – cardinal (shows quality of solution) rather than ordinal (shows ranking – first, second, third…).

A more restricted situation is when a state is either good or bad (success or failure). This is a binary preference function or a predicate utility.

43

We need utilities to act in reasonable ways Preference Relation

Consider some abstract set C with elements ci. Thus, C = {ci : i I} where I is some index set. For example, C can be the set of consequences that can arise from taking action from a particular state.

The following notation describes a preference relation between various elements of C

• ci cj : ci is preferred to cj .

• ci cj : the agent is indifferent1 between ci and cj ; the two elements are equally preferred.

• ci cj : ci is at least as preferred as cj .

44

Axioms of Utility Functions

45

The Utility Theorem simply says that if an agent has a preference relation that satisfy the axioms of preference then a real-valued utility function can be constructed that reflects this preference relation.

When we seek to compare alternatives, we look for a mixbetween alternatives

The notation [p,A; 1−p,B] denotes a lottery where, with probability p, the option A is won andwith probability 1 − p the option B is won.

Example

Suppose you had three ranked choices for candy bar

Snickers > Baby Ruth > Milky Way

Given your understanding of lotteries, which of the two choices do you prefer?

[.75,Baby Ruth; .25, Milk Way]

[.50,Baby Ruth; .50, Milk Way]

46

Axioms of Utility Functions

47

The Utility Theorem simply says that if an agent has a preference relation that satisfy the axioms of preference then a real-valued utility function can be constructed that reflects this preference relation.

Constructing the Utility

Example. You are graduating from college soon, and you have four job offers: one from Microsoft (as a programmer), one from McDonald’s (as a hamburger maker), one from Walmart (as a checkout clerk), and one from Sun (as a tester). Suppose that your preferences are as follows:

Microsoft Sun Walmart McDonald’s.

Construct a utility that represents this preference pattern.

48

49

What are advantages and disadvantages of utilities?

50

Difficulties with utility-based approaches:– where do the numbers come from?– we don’t always think in terms of utilities!– hard to formulate tasks in these terms– Exponential states – extracting utilities may be

difficult. Simple additive doesn’t express substitutes or complements.

Advantages– human-like – maximize pleasure– aids reuse - change rewards, get new behavior– flexible – can adapt to change in environment (new

opportunity, option becomes less advantageous)

51

Expected Utility & Optimal Agents

Write P(r | Ag, Env) to denote probability that run r occurs when agent Ag is placed in environment Env, noting non-deterministic results. We don’t know what state will result from our action.

Note sum of all choices is 1:

52

Expected Utility (on average utility) & Optimal Agents

Then optimal agent Agopt in an environment Env is the one that maximizes expected utility

arg says – return argument which maximizes the formula