CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions...

246
CS 188: Artificial Intelligence

Transcript of CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions...

Page 1: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

CS 188: Artificial Intelligence

Uncertainty and Utilities

Page 2: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

CS 188: Artificial Intelligence

Uncertainty and Utilities

Page 3: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Uncertain Outcomes

Page 4: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Worst-Case vs. Average Case

5

6

2 56 5

10 2 5 5

Idea: Uncertain outcomes controlled by chance, not an adversary!

Page 5: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Worst-Case vs. Average Case

5

6

2 5

6 5

10 2 5 5

Idea: Uncertain outcomes controlled by chance, not an adversary!

Page 6: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Worst-Case vs. Average Case

5

6

2 5

6 5

10 2 5 5

Idea: Uncertain outcomes controlled by chance, not an adversary!

Page 7: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Worst-Case vs. Average Case

5

6

2 5

6 5

10 2 5 5

Idea: Uncertain outcomes controlled by chance, not an adversary!

Page 8: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Worst-Case vs. Average Case

5

6

2 5

6 5

10 2 5 5

Idea: Uncertain outcomes controlled by chance, not an adversary!

Page 9: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Worst-Case vs. Average Case

5

6

2 5

6 5

10 2 5 5

Idea: Uncertain outcomes controlled by chance, not an adversary!

Page 10: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Worst-Case vs. Average Case

5

6

2 5

6 5

10 2 5 5

Idea: Uncertain outcomes controlled by chance, not an adversary!

Page 11: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 12: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dice

t Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 13: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomly

t Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 14: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 15: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 16: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 17: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax search

t Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 18: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertain

t Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 19: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilities

t I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 20: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 21: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 22: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 23: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Search

8 2 5 6

Why wouldn’t we know what the result of an action willbe?t Explicit randomness: rolling dicet Random opponents: ghosts respond randomlyt Actions can fail: robot wheels might spin

Values reflect average-case (expectimax) outcomes,not worst-case (minimax) outcomes

Expectimax search: compute average score under optimal playt Max nodes as in minimax searcht Chance nodes replace min nodes but the outcome is uncertaint Calculate their expected utilitiest I.e. take weighted average (expectation) of children

Later: formalize as Markov Decision Processes

[Demo: min vs exp (L7D1,2)]

Page 24: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Video of Demo Minimax vs Expectimax (Min)

Page 25: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Video of Demo Minimax vs Expectimax (Exp)

Page 26: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 27: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utility

t if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 28: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)

t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 29: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 30: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0

t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 31: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):

t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 32: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)

t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 33: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)

t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 34: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 35: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞

t for each s of succ(state):t v = max(v, value(s))t return v

Page 36: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):

t v = max(v, value(s))t return v

Page 37: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))

t return v

Page 38: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 39: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

def value(state):t if the state is a terminal state: return the state’s utilityt if the next agent is MAX: return max-value(state)t if the next agent is EXP: return exp-value(state)

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(s)t v += p * value(s)t return v

def max-value(state):t initialize v = -∞t for each s of succ(state):t v = max(v, value(s))t return v

Page 40: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 41: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0

t for each s of succ(state):t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 42: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):

t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 43: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)

t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 44: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)t v += p * value(successor)

t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 45: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 46: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 47: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8)

+ (1/3) (24) + (1/6) (-12) = 10

Page 48: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24)

+ (1/6) (-12) = 10

Page 49: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pseudocode

8 24 -12

10

1/21/3

1/6

def exp-value(state):t initialize v = 0t for each s of succ(state):t p = probability(successor)t v += p * value(successor)t return v

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

Page 50: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Example

8 4 7

78

3 12 9 2 4 6 15 6 0

Page 51: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Example

8

4 7

78

3 12 9 2 4 6 15 6 0

Page 52: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Example

8 4

7

78

3 12 9 2 4 6 15 6 0

Page 53: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Example

8 4 7

78

3 12 9 2 4 6 15 6 0

Page 54: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Example

8 4 7

7

8

3 12 9 2 4 6 15 6 0

Page 55: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Example

8 4 7

7

8

3 12 9 2 4 6 15 6 0

Page 56: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pruning?

Page 57: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Expectimax Pruning?

Page 58: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Depth-Limited Expectimax

Estimate true expectimax value(versus lot of work to computeexactly)

Page 59: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Depth-Limited Expectimax

Estimate true expectimax value(versus lot of work to computeexactly)

Page 60: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Depth-Limited Expectimax

Estimate true expectimax value(versus lot of work to computeexactly)

Page 61: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Probabilities

Page 62: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 63: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 64: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 65: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffic

t Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 66: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavy

t Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 67: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 68: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 69: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negative

t Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 70: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 71: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 72: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60

t Reasoning and updating probabilities later

Page 73: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 74: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: Probabilities

.25

.50

.25

Random variable picks an outcome

Probability distribution assigns weights to outcomes

Example: Traffic on freewayt Random variable: T = there’s traffict Outcomes: T in none, light, heavyt Distribution:P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):t Probabilities are always non-negativet Probabilities of outcomes sum to one

As we get more evidence, probabilities may change:t P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60t Reasoning and updating probabilities later

Page 75: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: ExpectationsThe expected value of a function of a random variable is the average,weighted by the probability distribution over outcomes

Example: How long to get to the airport?

.25×20min .+

.50×60min .+

.25×32min .=

43min .

Page 76: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: ExpectationsThe expected value of a function of a random variable is the average,weighted by the probability distribution over outcomes

Example: How long to get to the airport?

.25×20min .+

.50×60min .+

.25×32min .=

43min .

Page 77: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: ExpectationsThe expected value of a function of a random variable is the average,weighted by the probability distribution over outcomes

Example: How long to get to the airport?

.25×20min .+

.50×60min .+

.25×32min .=

43min .

Page 78: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: ExpectationsThe expected value of a function of a random variable is the average,weighted by the probability distribution over outcomes

Example: How long to get to the airport?

.25×20min .+

.50×60min .+

.25×32min .=

43min .

Page 79: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: ExpectationsThe expected value of a function of a random variable is the average,weighted by the probability distribution over outcomes

Example: How long to get to the airport?

.25×20min .+

.50×60min .+

.25×32min .=

43min .

Page 80: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Reminder: ExpectationsThe expected value of a function of a random variable is the average,weighted by the probability distribution over outcomes

Example: How long to get to the airport?

.25×20min .+

.50×60min .+

.25×32min .=

43min .

Page 81: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any state

t Model: possibly simple uniform distribution(roll die)t Model: possibly sophisticated and require lotsof computationt Chance node for any outcome out of ourcontrol: opponent or environmentt The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 82: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any statet Model: possibly simple uniform distribution(roll die)

t Model: possibly sophisticated and require lotsof computationt Chance node for any outcome out of ourcontrol: opponent or environmentt The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 83: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any statet Model: possibly simple uniform distribution(roll die)t Model: possibly sophisticated and require lotsof computation

t Chance node for any outcome out of ourcontrol: opponent or environmentt The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 84: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any statet Model: possibly simple uniform distribution(roll die)t Model: possibly sophisticated and require lotsof computationt Chance node for any outcome out of ourcontrol: opponent or environment

t The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 85: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any statet Model: possibly simple uniform distribution(roll die)t Model: possibly sophisticated and require lotsof computationt Chance node for any outcome out of ourcontrol: opponent or environmentt The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 86: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any statet Model: possibly simple uniform distribution(roll die)t Model: possibly sophisticated and require lotsof computationt Chance node for any outcome out of ourcontrol: opponent or environmentt The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 87: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any statet Model: possibly simple uniform distribution(roll die)t Model: possibly sophisticated and require lotsof computationt Chance node for any outcome out of ourcontrol: opponent or environmentt The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 88: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Probabilities to Use?

Expectimax search: a probabilistic model ofopponent (or environment) in any statet Model: possibly simple uniform distribution(roll die)t Model: possibly sophisticated and require lotsof computationt Chance node for any outcome out of ourcontrol: opponent or environmentt The model might say that adversarial actionsare likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Having a probabilistic belief about another agent’s action does notmean that the agent is flipping any coins!

Page 89: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quicklyt Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 90: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quicklyt Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 91: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quicklyt Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 92: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!

t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quicklyt Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 93: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponent

t Gets very slow very quicklyt Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 94: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quickly

t Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 95: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quicklyt Worse if simulate your opponentsimulating you

t ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 96: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quicklyt Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 97: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2minimax, using the result 80% of the time, and moving randomlyotherwise

Question: What tree search should you use?

0.1 0.9

Answer: Expectimax!t EACH chance node’s probabilities,must run a simulation of your opponentt Gets very slow very quicklyt Worse if simulate your opponentsimulating yout ... except for minimax, which has thenice property that it all collapses into onegame tree

Page 98: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Modeling Assumptions

Page 99: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

The Dangers of Optimism and Pessimism

Dangerous Optimism

Assuming chance when the world is adversarial

Dangerous Pessimism

Assuming the worst case when it’s not likely

Page 100: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

The Dangers of Optimism and Pessimism

Dangerous Optimism

Assuming chance when the world is adversarial

Dangerous Pessimism

Assuming the worst case when it’s not likely

Page 101: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

The Dangers of Optimism and Pessimism

Dangerous Optimism

Assuming chance when the world is adversarial

Dangerous Pessimism

Assuming the worst case when it’s not likely

Page 102: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

The Dangers of Optimism and Pessimism

Dangerous Optimism

Assuming chance when the world is adversarial

Dangerous Pessimism

Assuming the worst case when it’s not likely

Page 103: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

The Dangers of Optimism and Pessimism

Dangerous Optimism

Assuming chance when the world is adversarial

Dangerous Pessimism

Assuming the worst case when it’s not likely

Page 104: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483 5/5 Avg:493Expectimax 1/5 Avg:-303 5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 105: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483

5/5 Avg:493Expectimax 1/5 Avg:-303 5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 106: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483 5/5 Avg:493

Expectimax 1/5 Avg:-303 5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 107: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483 5/5 Avg:493Expectimax 1/5 Avg:-303

5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 108: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483 5/5 Avg:493Expectimax 1/5 Avg:-303 5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 109: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483 5/5 Avg:493Expectimax 1/5 Avg:-303 5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 110: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483 5/5 Avg:493Expectimax 1/5 Avg:-303 5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 111: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Assumptions vs. Reality

[Demos: world assumptions (L7D3,4,5,6)]

Results from playing 5 gamesAdvers. Ghost Random Ghost

Minimax 5/5 Avg:483 5/5 Avg:493Expectimax 1/5 Avg:-303 5/5 Avg: 503

Pacman used depth 4 search with an evalfunction that avoids troubleGhost used depth 2search with an eval function that seeks Pacman

Page 112: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Demo Video:Random Ghost – Expectimax Pacman

Page 113: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Demo Video – Minimax Pacman

Page 114: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Demo Video: Ghost – Expectimax Pacman

Page 115: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Demo Video: Random Ghost – Minimax Pacman

Page 116: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Other Game Types

Page 117: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Mixed Layer Types

E.g. Backgammon

Expectiminimaxt Environment is an extra “random agent” player that moves aftereach min/max agentt Each node computes the appropriate combination of its children

Page 118: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Mixed Layer Types

E.g. Backgammon

Expectiminimaxt Environment is an extra “random agent” player that moves aftereach min/max agentt Each node computes the appropriate combination of its children

Page 119: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Mixed Layer Types

E.g. Backgammon

Expectiminimaxt Environment is an extra “random agent” player that moves aftereach min/max agent

t Each node computes the appropriate combination of its children

Page 120: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Mixed Layer Types

E.g. Backgammon

Expectiminimaxt Environment is an extra “random agent” player that moves aftereach min/max agentt Each node computes the appropriate combination of its children

Page 121: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Mixed Layer Types

E.g. Backgammon

Expectiminimaxt Environment is an extra “random agent” player that moves aftereach min/max agentt Each node computes the appropriate combination of its children

Page 122: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 123: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal moves

t Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 124: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 125: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 126: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminished

t So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 127: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damaging

t But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 128: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 129: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 130: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 131: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 132: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2dicet Backgammon ≈ 20 legal movest Depth 2 = 20x(21x20)3 = 1.2x109

As depth increases, probability of reaching agiven search node shrinkst So usefulness of search is diminishedt So limiting depth is less damagingt But pruning is trickier. . .

Historic AI: TDGammon uses depth-2 search + very good evaluationfunction + reinforcement learning: world-champion level play

1st AI world champion in any game!

Image: Wikipedia

Page 133: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuplest Each player maximizes its owncomponentt Can give rise to cooperation andt competition dynamically. . .

Page 134: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuplest Each player maximizes its owncomponentt Can give rise to cooperation andt competition dynamically. . .

Page 135: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuples

t Node values are also utility tuplest Each player maximizes its owncomponentt Can give rise to cooperation andt competition dynamically. . .

Page 136: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuples

t Each player maximizes its owncomponentt Can give rise to cooperation andt competition dynamically. . .

Page 137: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuplest Each player maximizes its owncomponent

t Can give rise to cooperation andt competition dynamically. . .

Page 138: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuplest Each player maximizes its owncomponent

t Can give rise to cooperation andt competition dynamically. . .

Page 139: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuplest Each player maximizes its owncomponentt Can give rise to cooperation and

t competition dynamically. . .

Page 140: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuplest Each player maximizes its owncomponentt Can give rise to cooperation andt competition dynamically. . .

Page 141: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Multi-Agent Utilities

(1,6,6)

What if the game is not zero-sum, orhas multiple players?

Generalization of minimax:t Terminals have utility tuplest Node values are also utility tuplest Each player maximizes its owncomponentt Can give rise to cooperation andt competition dynamically. . .

Page 142: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utilities

Page 143: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 144: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 145: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 146: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 147: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?

t How do we know such utilities even exist?t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 148: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?

t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 149: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?t How do we know that averaging even makes sense?

t What if our behavior (preferences) can’t be described by utilities?

Page 150: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 151: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:t A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:t Where do utilities come from?t How do we know such utilities even exist?t How do we know that averaging even makes sense?t What if our behavior (preferences) can’t be described by utilities?

Page 152: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Utilities to Use?

For worst-case minimax reasoning, terminal function scale doesn’tmattert We just want better states to have higher evaluations (get theordering right)t We call this insensitivity to monotonic transformations

For average-case expectimax reasoning, we need magnitudes to bemeaningful

Page 153: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Utilities to Use?

For worst-case minimax reasoning, terminal function scale doesn’tmattert We just want better states to have higher evaluations (get theordering right)

t We call this insensitivity to monotonic transformations

For average-case expectimax reasoning, we need magnitudes to bemeaningful

Page 154: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Utilities to Use?

For worst-case minimax reasoning, terminal function scale doesn’tmattert We just want better states to have higher evaluations (get theordering right)t We call this insensitivity to monotonic transformations

For average-case expectimax reasoning, we need magnitudes to bemeaningful

Page 155: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Utilities to Use?

For worst-case minimax reasoning, terminal function scale doesn’tmattert We just want better states to have higher evaluations (get theordering right)t We call this insensitivity to monotonic transformations

For average-case expectimax reasoning, we need magnitudes to bemeaningful

Page 156: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

What Utilities to Use?

For worst-case minimax reasoning, terminal function scale doesn’tmattert We just want better states to have higher evaluations (get theordering right)t We call this insensitivity to monotonic transformations

For average-case expectimax reasoning, we need magnitudes to bemeaningful

Page 157: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)

t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?t Why don’t we prescribe behaviors?

Page 158: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?t Why don’t we prescribe behaviors?

Page 159: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?t Why don’t we prescribe behaviors?

Page 160: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?t Why don’t we prescribe behaviors?

Page 161: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?

t Why don’t we prescribe behaviors?

Page 162: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?t Why don’t we prescribe behaviors?

Page 163: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?t Why don’t we prescribe behaviors?

Page 164: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

UtilitiesUtilities: functions from outcomes (states of the world) to realnumbers that describe agent’s preferences

Where do utilities come from?t In a game, may be simple (+1/-1)t Utilities summarize the agent’s goals

t Theorem: any “rational” preferences can be summarized as a utilityfunction

We hard-wire utilities and let behaviors emerget Why don’t we let agents pick utilities?t Why don’t we prescribe behaviors?

Page 165: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utilities: Uncertain Outcomes

Get Single Get Double

Oops. Whew!

Page 166: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utilities: Uncertain Outcomes

Get Single Get Double

Oops. Whew!

Page 167: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utilities: Uncertain Outcomes

Get Single Get Double

Oops. Whew!

Page 168: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utilities: Uncertain Outcomes

Get Single Get Double

Oops.

Whew!

Page 169: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utilities: Uncertain Outcomes

Get Single Get Double

Oops. Whew!

Page 170: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Preferences

An agent must have preferences among:t Prizes: A, B, etc.t Lotteries: uncertain prizes

Notation:t Preference: A� Bt Indifference: A∼ B

Page 171: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Preferences

An agent must have preferences among:t Prizes: A, B, etc.

t Lotteries: uncertain prizes

Notation:t Preference: A� Bt Indifference: A∼ B

Page 172: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Preferences

An agent must have preferences among:t Prizes: A, B, etc.t Lotteries: uncertain prizes

Notation:t Preference: A� Bt Indifference: A∼ B

Page 173: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Preferences

An agent must have preferences among:t Prizes: A, B, etc.t Lotteries: uncertain prizes

Notation:t Preference: A� Bt Indifference: A∼ B

Page 174: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Preferences

An agent must have preferences among:t Prizes: A, B, etc.t Lotteries: uncertain prizes

Notation:t Preference: A� B

t Indifference: A∼ B

Page 175: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Preferences

An agent must have preferences among:t Prizes: A, B, etc.t Lotteries: uncertain prizes

Notation:t Preference: A� Bt Indifference: A∼ B

Page 176: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Preferences

An agent must have preferences among:t Prizes: A, B, etc.t Lotteries: uncertain prizes

Notation:t Preference: A� Bt Indifference: A∼ B

Page 177: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rationality

Page 178: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

We want some constraints on preferences before we call themrational, such as:

Axiom of Transitivity:A� B∧B � C =⇒ A� C.

For example: an agent with intransitivepreferences can be induced to give away all ofits moneyt If B � C, then an agent with C would pay (say)1 cent to get Bt If A� B, then an agent with B would pay (say)1 cent to get At If C � A, then an agent with A would pay (say)1 cent to get C

Page 179: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

We want some constraints on preferences before we call themrational, such as:

Axiom of Transitivity:A� B∧B � C =⇒ A� C.

For example: an agent with intransitivepreferences can be induced to give away all ofits moneyt If B � C, then an agent with C would pay (say)1 cent to get Bt If A� B, then an agent with B would pay (say)1 cent to get At If C � A, then an agent with A would pay (say)1 cent to get C

Page 180: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

We want some constraints on preferences before we call themrational, such as:

Axiom of Transitivity:A� B∧B � C =⇒ A� C.

For example: an agent with intransitivepreferences can be induced to give away all ofits moneyt If B � C, then an agent with C would pay (say)1 cent to get Bt If A� B, then an agent with B would pay (say)1 cent to get At If C � A, then an agent with A would pay (say)1 cent to get C

Page 181: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

We want some constraints on preferences before we call themrational, such as:

Axiom of Transitivity:A� B∧B � C =⇒ A� C.

For example: an agent with intransitivepreferences can be induced to give away all ofits moneyt If B � C, then an agent with C would pay (say)1 cent to get B

t If A� B, then an agent with B would pay (say)1 cent to get At If C � A, then an agent with A would pay (say)1 cent to get C

Page 182: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

We want some constraints on preferences before we call themrational, such as:

Axiom of Transitivity:A� B∧B � C =⇒ A� C.

For example: an agent with intransitivepreferences can be induced to give away all ofits moneyt If B � C, then an agent with C would pay (say)1 cent to get Bt If A� B, then an agent with B would pay (say)1 cent to get A

t If C � A, then an agent with A would pay (say)1 cent to get C

Page 183: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

We want some constraints on preferences before we call themrational, such as:

Axiom of Transitivity:A� B∧B � C =⇒ A� C.

For example: an agent with intransitivepreferences can be induced to give away all ofits moneyt If B � C, then an agent with C would pay (say)1 cent to get Bt If A� B, then an agent with B would pay (say)1 cent to get At If C � A, then an agent with A would pay (say)1 cent to get C

Page 184: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

We want some constraints on preferences before we call themrational, such as:

Axiom of Transitivity:A� B∧B � C =⇒ A� C.

For example: an agent with intransitivepreferences can be induced to give away all ofits moneyt If B � C, then an agent with C would pay (say)1 cent to get Bt If A� B, then an agent with B would pay (say)1 cent to get At If C � A, then an agent with A would pay (say)1 cent to get C

Page 185: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

Theorem: Rational preferences imply behavior describable asmaximization of expected utility

The Axioms of Rationality

Page 186: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

Theorem: Rational preferences imply behavior describable asmaximization of expected utility

The Axioms of Rationality

Page 187: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Rational Preferences

Theorem: Rational preferences imply behavior describable asmaximization of expected utility

The Axioms of Rationality

Page 188: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]

t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 189: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 190: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)

t I.e. values assigned by U preserve preferences of bothprizes and lotteries!

Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 191: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!

Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 192: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!Maximum expected utility (MEU) principle:

t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 193: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utility

t Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 194: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilities

t E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 195: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 196: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

MEU Principle

Theorem [Ramsey, 1931; von Neumann & Morgenstern,1944]t Given any preferences satisfying these constraints, thereexists a real-valued function U such that:

U(A)≥ U(B)↔ A� B.U([p1,S1; . . . ;pn,Sn]) = ∑i piU(Si)t I.e. values assigned by U preserve preferences of both

prizes and lotteries!Maximum expected utility (MEU) principle:t Choose the action that maximizes expected utilityt Note: an agent can be entirely rational (consistent with MEU)without ever representing or manipulating utilities and probabilitiest E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner

Page 197: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Page 198: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utility Scales

Normalized utilities:u+ = 1.0,u− = 0.0.

Micromorts: one-millionth chance of death, useful forpaying to reduce product risks, etc.

QALYs: quality-adjusted life years, useful for medicaldecisions involving substantial risk

Note: behavior is invariant under positive lineartransformation

U ′(x) = k1U(x)+k2

With deterministic prizes only (no lottery choices), onlyordinal utility can be determined, i.e., total order onprizes

Page 199: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utility Scales

Normalized utilities:u+ = 1.0,u− = 0.0.

Micromorts: one-millionth chance of death, useful forpaying to reduce product risks, etc.

QALYs: quality-adjusted life years, useful for medicaldecisions involving substantial risk

Note: behavior is invariant under positive lineartransformation

U ′(x) = k1U(x)+k2

With deterministic prizes only (no lottery choices), onlyordinal utility can be determined, i.e., total order onprizes

Page 200: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utility Scales

Normalized utilities:u+ = 1.0,u− = 0.0.

Micromorts: one-millionth chance of death, useful forpaying to reduce product risks, etc.

QALYs: quality-adjusted life years, useful for medicaldecisions involving substantial risk

Note: behavior is invariant under positive lineartransformation

U ′(x) = k1U(x)+k2

With deterministic prizes only (no lottery choices), onlyordinal utility can be determined, i.e., total order onprizes

Page 201: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utility Scales

Normalized utilities:u+ = 1.0,u− = 0.0.

Micromorts: one-millionth chance of death, useful forpaying to reduce product risks, etc.

QALYs: quality-adjusted life years, useful for medicaldecisions involving substantial risk

Note: behavior is invariant under positive lineartransformation

U ′(x) = k1U(x)+k2

With deterministic prizes only (no lottery choices), onlyordinal utility can be determined, i.e., total order onprizes

Page 202: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utility Scales

Normalized utilities:u+ = 1.0,u− = 0.0.

Micromorts: one-millionth chance of death, useful forpaying to reduce product risks, etc.

QALYs: quality-adjusted life years, useful for medicaldecisions involving substantial risk

Note: behavior is invariant under positive lineartransformation

U ′(x) = k1U(x)+k2

With deterministic prizes only (no lottery choices), onlyordinal utility can be determined, i.e., total order onprizes

Page 203: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utility Scales

Normalized utilities:u+ = 1.0,u− = 0.0.

Micromorts: one-millionth chance of death, useful forpaying to reduce product risks, etc.

QALYs: quality-adjusted life years, useful for medicaldecisions involving substantial risk

Note: behavior is invariant under positive lineartransformation

U ′(x) = k1U(x)+k2

With deterministic prizes only (no lottery choices), onlyordinal utility can be determined, i.e., total order onprizes

Page 204: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Utility Scales

Normalized utilities:u+ = 1.0,u− = 0.0.

Micromorts: one-millionth chance of death, useful forpaying to reduce product risks, etc.

QALYs: quality-adjusted life years, useful for medicaldecisions involving substantial risk

Note: behavior is invariant under positive lineartransformation

U ′(x) = k1U(x)+k2

With deterministic prizes only (no lottery choices), onlyordinal utility can be determined, i.e., total order onprizes

Page 205: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 206: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp between

t “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 207: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability p

t “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 208: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 209: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 210: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lp

t Resulting p is a utility in [0,1]

Page 211: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 212: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 213: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Human Utilities

Utilities map states to real numbers. Which numbers?

Standard approach to assessment (elicitation) ofhuman utilities:t Compare a prize A to a standard lottery Lp betweent “best possible prize” u+ with probability pt “worst possible catastrophe” u− with probability 1-p

t Adjust lottery probability p until indifference: A Lpt Resulting p is a utility in [0,1]

Page 214: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 215: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 216: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Y

t U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 217: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )

t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 218: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))

t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 219: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averse

t When deep in debt, people are risk-prone

Page 220: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 221: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 222: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Money

Money does not behave as a utility function, but there is utility inhaving money (or being in debt)

Given a lottery L = [p, $X; (1-p), $Y]t Expected monetary value EMV(L):p ∗X +(1−p)∗Yt U(L) = p ∗U($X )+(1−p)∗U($Y )t Typically, U(L)< U(EMV (L))t In this sense, people are risk-averset When deep in debt, people are risk-prone

Page 223: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 224: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 225: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)

t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 226: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?

t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 227: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lottery

t $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 228: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most people

t Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 229: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premium

t There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 230: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir risk

t If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 231: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!

t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 232: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 233: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Insurance

Consider the lottery:

[0.5, $1000; 0.5, $0]

t What is its expected monetary value? ($500)t What is its certainty equivalent?t Monetary value acceptable in lieu of lotteryt $400 for most peoplet Difference of $100 is the insurance premiumt There’s an insurance industry because people will pay to reducetheir riskt If everyone were risk-neutral, no insurance needed!t It’s win-win: you’d rather have the $400 and the insurance companywould rather have the lottery (their utility curve is flat and they havemany lotteries)

Page 234: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 235: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]

t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 236: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 237: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]

t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 238: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 239: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 240: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 241: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)

t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 242: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)

What’s going on! Doh!

Page 243: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)

What’s going on! Doh!

Page 244: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on!

Doh!

Page 245: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Example: Human Rationality?

Famous example of Allais (1953)t A: [0.8, $4k; 0.2, $0]t B: [1.0, $3k; 0.0, $0]

t C: [0.2, $4k; 0.8, $0]t D: [0.25, $3k; 0.75, $0]

Most people prefer B > A, C > D

But if U($0) = 0, thent B > A =⇒ U($3k) > 0.8 U($4k)t C > D =⇒ 0.8 U($4k) > U($3k)What’s going on! Doh!

Page 246: CS 188: Artificial Intelligenceinst.eecs.berkeley.edu/~cs188/sp20/assets/lecture/lec-9.pdftActions can fail: robot wheels might spin Values reflect average-case (expectimax) outcomes,

Next Time: MDPs!