Lirong Xia Utilities and decision theory Friday, Feb 28, 2014.
-
Upload
samson-stafford -
Category
Documents
-
view
216 -
download
0
Transcript of Lirong Xia Utilities and decision theory Friday, Feb 28, 2014.
Lirong Xia
Utilities and decision theory
Friday, Feb 28, 2014
• Midterm Mar 7
– in-class
– open book and lecture notes
– simple calculators are allowed
– cannot use smartphone/laptops/wifi
– practice exams and solutions (check piazza)
• Project 2 deadline Mar 18 midnight
2
Reminder
• Given random variables
Z1,…Zp, we are asked
whether X Y|Z⊥ 1,…Zp
– dependent if there exists a
path where all triples are
active
– independent if for each path,
there exists an inactive triple3
Checking conditional independence from BN graph
• Compute a marginal probability p(x1,…,xp) in a
Bayesian network
– Let Y1,…,Yk denote the remaining variables
– Step 1: fix an order over the Y’s (wlog Y1>…>Yk)
– Step 2: rewrite the summation as
Σy1 Σy2
…Σyk-1 Σyk
anything
– Step 3: variable elimination from right to left 4
General method for variable elimination
sth only
involving
X’s
sth only
involving Y1
and X’s
sth only
involving Y1,
Y2 and X’s
sth only
involving Y1,
Y2,…,Yk-1
and X’s
• Utility theory
– expected utility: preferences over lotteries
– maximum expected utility (MEU) principle
5
Today
Expectimax Search Trees
6
• Expectimax search– Max nodes (we) as in minimax search– Chance nodes
• Need to compute chance node values as expected utilities
• Next class we will formalize the
underlying problem as a Markov
decision Process
Expectimax Pseudocode
7
• Def value(s):If s is a max node return maxValue(s)If s is a chance node return expValue(s)If s is a terminal node return evaluations(s)
• Def maxValue(s):values = [value(s’) for s’ in successors(s)]
return max(values)
• Def expValue(s):values = [value(s’) for s’ in successors(s)]weights = [probability(s, s’) for s’ in successors(s)]
return expectation(values, weights)
Expectimax Quantities
8
Maximum expected utility
9
• Why should we average utilities? Why not
minimax?
• Principle of maximum expected utility:– A rational agent should chose the action which
maximizes its expected utility, given its (probabilistic) knowledge
• Questions:– Where do utilities come from?– How do we know such utilities even exist?– Why are we taking expectations of utilities (not, e.g.
minimax)?– What if our behavior can’t be described by utilities?
Inference with Bayes’ Rule
10
• Example: diagnostic probability from causal probability:
• Example:– F is fire, {f, ¬f}
– A is the alarm, {a, ¬a}
– Note: posterior probability of fire still very small– Note: you should still run when hearing an alarm! Why?
Causep Effect Cause p Cause
p Effectp Effect
p(f)=0.01
p(a|f)=0.9p(a)=0.1
11
0.009
stay
run out
0.991
100%
Utilities
12
• Utilities are functions from outcomes
(states of the world, sample space) to
real numbers that represent an
agent’s preferences
• Where do utilities come from?– In a game, may be simple (+1/-1)– Utilities summarize the agent’s goals
-10100
100
-100
Preferences over lotteries
13
• An agent chooses among:– Prizes: A, B, etc.– Lotteries: situations with
uncertain prizes
• Notation:
L
A
B
p
1-p
A is strictly preferred to B
In difference between A and B
A is strictly preferred to or indifferent with B
• How many lotteries?
– infinite!
• Need to find a compact representation
– Maximum expected utility (MEU) principle
– which type of preferences (rankings over
lotteries) can be represented by MEU?
14
Encoding preferences over lotteries
Rational Preferences
15
• We want some constraints on preferences before we call them rational
• For example: an agent with intransitive preferences can be induced to give away all of its money– If B>C, then an agent with C would
pay (say) 1 cent to get B– If A>B, then an agent with B would
pay (say) 1 cent to get A– If C>A, then an agent with A would
pay (say) 1 cent to get C
Rational Preferences
16
• Preference of a rational agent must obey constraints– The axioms of rationality: for all lotteries A, B, C
Orderability
Transitivity
Continuity
Substitutability
Monotonicity
• Theorem: rational preferences imply behavior describable as maximization of expected utility
MEU Principle
17
• Theorem:– [Ramsey, 1931; von Neumann & Morgenstern, 1944]– Given any preference satisfying these axioms, there exists a
real-value function U such that:
• Maximum expected utility (MEU) principle:– Choose the action that maximizes expected utility– Utilities are just a representation!
• an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities
– Utilities are NOT money
18
0.009
stay
run out
0.991
100%
What would you do?
-10100
100
-100
19
Common types of utilities
• State of the world: money you earn– Money does not behave as a utility function, but we can
talk about the utility of having money (or being in debt)• Which would you prefer?
– A lottery ticket that pays out $10 with probability .5 and $0 otherwise, or
– A lottery ticket that pays out $3 with probability 1
• How about:– A lottery ticket that pays out $100,000,000 with probability .5 and
$0 otherwise, or– A lottery ticket that pays out $30,000,000 with probability 1
• Usually, people do not simply go by expected value
20
Utility theory in Economics
Risk attitudes
• An agent is risk-neutral if she only cares about the expected value of the lottery ticket
• An agent is risk-averse if she always prefers the expected value of the lottery ticket to the lottery ticket– Most people are like this
• An agent is risk-seeking if she always prefers the lottery ticket to the expected value of the lottery ticket
Decreasing marginal utility• Typically, at some point, having an extra dollar
does not make people much happier (decreasing marginal utility)
utility
money$200 $1500 $5000
buy a bike (utility = 1)
buy a car (utility = 2)
buy a nicer car (utility = 3)
Maximizing expected utility
• Lottery 1: get $1500 with probability 1– gives expected utility 2
• Lottery 2: get $5000 with probability .4, $200 otherwise– gives expected utility .4*3 + .6*1 = 1.8– (expected amount of money = .4*$5000 + .6*$200 = $2120 > $1500)
• So: maximizing expected utility is consistent with risk aversion
utility
money$200 $1500 $5000
buy a bike (utility = 1)
buy a car (utility = 2)
buy a nicer car (utility = 3)
Different possible risk attitudes under expected utility maximization
utility
money
• Green has decreasing marginal utility → risk-averse• Blue has constant marginal utility → risk-neutral• Red has increasing marginal utility → risk-seeking• Grey’s marginal utility is sometimes increasing,
sometimes decreasing → neither risk-averse (everywhere) nor risk-seeking (everywhere)
Example: Insurance
25
• Consider the lottery [0.5, $1000; 0.5, $0]
– What is its expected monetary value (EMV)? ($500)
– What is its certainty equivalent?
• Monetary value acceptable in lieu of lottery
• $400 for most people
– Difference of $100 is the insurance premium
• There is an insurance industry because people will pay to reduce their risk
• If everyone were risk-neutral, no insurance needed!
Example: Insurance
26
• Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility
Amount Your Utility UY
$0 0
-$50 -150
-$200 -1000
You own a car. Your lottery:LY = [0.8, $0; 0.2, -$200]
i.e., 20% chance of crashing
You do not want -$200!
Insurance is $50
UY(LY) = 0.2*UY(-$200)=-200
UY(-$50)=-150
Example: Insurance
27
• Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility
You own a car. Your lottery:
LY = [0.8, $0; 0.2, -$200]
i.e., 20% chance of crashing
You do not want -$200!
UY(LY) = 0.2*UY(-$200)=-200
UY(-$50)=-150
Insurance company buys risk:
LI = [0.8, $50; 0.2, -$150]
i.e., $50 revenue + your LY
Insurer is risk-neutral:
U(L) = U(EMV(L))
UI(LI) = U(0.8*50+0.2*(-150))
= U($10) >U($0)
• Finite number of periods:– Overall utility = sum of rewards in individual periods
• Infinite number of periods:– … are we just going to add up the rewards over infinitely many
periods?– Always get infinity!
• (Limit of) average payoff: limn→∞Σ1≤t≤nr(t)/n– Limit may not exist…
• Discounted payoff: Σtϒt r(t) for some ϒ < 1– Interpretations of discounting:
• Interest rate r: ϒ= 1/(1+r)• World ends with some probability 1-ϒ
• Discounting is mathematically convenient– We will see more in the next class
28
Acting optimally over time