1 BAMS 517 Decision Analysis – II Acquiring and Using Information Martin L. Puterman UBC Sauder...

1

BAMS 517Decision Analysis – IIAcquiring and Using Information

Martin L. Puterman

UBC Sauder School of Business

Winter Term 2 2009

2

The “Monty Hall” problem

Monty Hall was the host of the once-popular game show “Let’s Make a Deal” In the show, contestants were shown three doors, behind each of which was a

prize. The contestant chose a door and received the prize behind that door This setup was behind one of the most notorious problems in probability Suppose you are the contestant, and Monty tells you that there is a car behind

one of the doors, and a goat behind each of the other doors. (Of course, Monty knows where the car is)

Suppose you choose door #1

3

The “Monty Hall” problem

Before revealing what’s behind door #1, Monty says “Now I’m going to reveal to you one of the other doors you didn’t choose” and opens door #3 to show that there is a goat behind the door.

Monty now says: “Before I open door #1, I’m going to allow you to change your choice. Would you rather that I open door #2 instead, or do you want to stick with your original choice of door #1?”

What do you do?

4

Bayes’ rule – repeated in a more general form

Bayes’ Rule can be written in general as

where A1, A2, …,An is a partition of the sample space. Note there is also a continuous version involving integrals. In the above

P(Ai) for i=1,…,n are called the prior probabilities P(B|Ai) for i=1,…,n are called the likelihoods P(Ai|B) for i=1,…,n are called the posterior probabilities

)()|()()|()()|(

)()|()|(

2211

111

nn APABPAPABPAPABP

APABPBAP

5

The “Monty Hall” problem - analysis

It does not appear at first that Monty’s showing you a goat behind door #3 is at all relevant to whether the car is behind doors 1 or 2. So why should you switch?

A careful analysis is needed here. What is important to remember is that you chose door 1 and Monte knows where the car is. Let C1, C2, C3 denote the events of the car being behind doors 1, 2, and 3 Initially, you believe P(C1) = P(C2) = P(C3) = 1/3 Since you chose door #1, Monty could have either opened door #2 or door

#3. Denote these events by M2 and M3. He could never open door 1; since that would defeat the purpose of the show.

Suppose the car is really behind door #2. Then in order to show you a goat, Monty would have had to open door #3. Thus P(M3|C2) = 1

Suppose the car is really behind door #1. Then Monty had the choice of opening either door #2 or door #3. Assume that the probability of Monty’s opening door 3 in this case is ½. So P(M3|C1) = P(M2|C1)= ½. Note this result holds for any p; 0<p<1.

To make a decision, we need to find P(C1 | M3) and P(C2 | M3). To do this we apply Bayes’ Rule.

6

The “Monty Hall” problem - solution

Hence P(C2|M3) = 1 -1/3 = 2/3. Thus your probability of getting the car is actually better if you switch doors!

It’s 2/3, rather than 1/3 if you stay with door #1 The information Monty gives you is relevant because it is more likely Monte will chose

door #3 when the car is behind door #2 (assuming you picked door #1). Ignoring the denominator in Bayes’ rule, we can state

In this problem the prior probabilities are all equal so that the posterior probability is directly proportional to the likelihood The likelihood of Monte opening door 3 is higher when the car is behind door 2

than when it is behind door 1. Therefore the posterior probability the car is behind door 2 is higher than door 1.

We can also use this formula to compute exact probabilities.

3/13/103/113/12/1

3/12/1

)3()3|3()2()2|3()1()1|3(

)1()1|3()3|1(

CPCMPCPCMPCPCMP

CPCMPMCP

posterior ≈ prior x likelihood

7

Monty Hall Problem revisited

Another way to think about it. Suppose you do not change your guess. Then you

would have won if you if the car was behind door 1. This happens independent of Monte’s action (why?) with probability 1/3.

Suppose you switch. Then you would have won if the car was not behind door 1. (Why?) This occurs with probability 2/3.

Note if the car was behind 3, Monte would have opened door 2.

8

Another and quite different Bayes’ example Suppose demand for a product is Poisson distributed but the rate parameter λ is not

known with certainty.

For simplicity assume a prior distribution for λ given by P(λ=10)=.4 and P(λ= 15)=.6 In general we can assume any discrete or continuous distribution for λ. Picking a gamma distribution gives nice formulas for the posterior.

Suppose we observe one week’s demand and it equals 11. How does this change our assessment of P(λ=10) and P(λ= 15)?

The likelihoods are f(11|10) =.1137, f(11|15) =.0663 So the Posterior estimates of the probability are obtained from Bayes’ Rule in the form

on slide 6: P(λ=10|n =11) ≈ .1137• .4 = .0455 P(λ=15|n =11) ≈ .0663 • .6 = .0398

Since these two quantities have to sum to one P(λ=10|n =11) = .0455/ (.0455+.0398) = 0.534 P(λ=15|n =11) = .0398 / (.0455+.0398) = 0.466

!)|(

n

exf

n

9

More about Bayesian updating; beta prior and binomial likelihood

Suppose our event has a Binomial distribution;

Special case; when n = 1 is a Bernouilli distribution. Now suppose that our prior distribution on θ has a Beta Distribution;

Emphasis: this is a distribution on θ and has support [0,1] It is highly flexible and can represent a wide range of shapes We will call it Beta(α,β). Its mean is α/α+β and its variance is αβ/(α+β)2(α+β+1)

Suppose we have one realization of the binomial and observe x. Then the posterior distribution is Beta( α+x, n-x+β)!

This is quite amazing. The proof is quite simple and just involves some renormalization in the integration over θ.

Thumbtack experiment revisited. In last class our prior looked pretty flat. After observing one “tack up” in one toss our prior had become sharper and less spread. This is supported by this theory. We could have fit a prior distribution to our data and then done formal analysis.

nxx

nxXP xnx ,,1,0)1()(

11 )1(),(

1),|(

B

10

What is our new assessment of probabilities of observing outcomes after our observation? This is a bit more complicated

The likelihood remains as before but what we need is the marginal distribution to know is the unconditional probability of observing x.

In our example; the prior is Beta(α,β) and the likelihood is Binomial(n,x). It is simple to derive the marginal distribution of x successes

which is.

where Γ(x) = (x-1)! when x is integer. If α and β are integers the above simplifies.

dfxlxp )()|()(

)()()(

)()()()(

n

xnx

x

nxp

11

Conjugate priors

A prior distribution is said to be conjugate to a likelihood if the posterior distribution has the same form as the prior distribution.

Important examples include Beta prior and Binomial likelihood Normal prior and Normal likelihood Gamma prior and Poisson likelihood.

See the above link for more examples. Certain cases have nice marginal distributions

too.

12

Degrees of certainty

In some cases, like flipping a coin or drawing a ball from an urn at random, we feel pretty sure of the probabilities of the outcomes We really expect the probability of heads to be at or very near to 50%

In other cases, we may not be so sure Consider flipping a thumbtack into the air. It could land in one of either two

ways: ‘pin up’ or ‘pin down’:

What is the probability of the tack landing pin up on the next throw? Suppose we assess it to be 50%. However, we may not feel as though the precision of this assignment is as great as the answer we gave for the coin. We just don’t know enough about the tack to say for sure

‘pin up’ ‘pin down’

13


How can we distinguish between such ‘vague’ and ‘precise’ probability assessments in our analyses?

Let’s consider the following outcomes: the coin has probability p of landing heads, where p can be any number in {0, .01, .02, …, .99, 1} Label these events as 0, …, 100 Of course, probabilities can be any real number between 0 and 1. To keep things

simple and discrete, we are only considering probabilities up to 2 decimals here Now consider the thumbtack: it has probability q of landing pin up, where q can be

any number in {0, .01, .02, …, .99, 1}. Label these events 0, …, 100 Suppose we assign prior probabilities to each of 0, …, 100 and 0, …, 100. The

prior probabilities for the coin’s likelihood of coming up heads should be much more concentrated around 50% than the prior probabilities for the tack’s likelihood of landing pin up Don’t confuse the event j – the event that the coin has probability j/100 of landing

heads when tossed – with the prior probability P(i) assigned to this event It may help to think of P(i) as the probability that the coin is ‘disposed’ to land heads j

percent of the time

14

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.01

0.07

0.13

0.19

0.25

0.31

0.37

0.43

0.49

0.55

0.61

0.67

0.73

0.79

0.85

0.91

0.97

Probability of landing pin up (phi)

Pri

or

Pro

bab

ilit

y (P

(ph

i))

0

0.010.02

0.03

0.040.05

0.06

0.070.08

0.09

0.01

0.07

0.13

0.19

0.25

0.31

0.37

0.43

0.49

0.55

0.61

0.67

0.73

0.79

0.85

0.91

0.97

Probability of landing heads (theta)

Pri

or

Pro

bab

ilit

y (P

(th

eta)

)


The graphs display representative prior probabilities for the coin and the tack

Note our prior distribution for the probability of the tack landing pin up is more ‘spread out’ than that for the probability of the coin landing heads

Let H and U be the events of the coin landing heads and the tack landing pin up in a given toss, respectively P(H) = i P(H | i) P(i) ; P(U) = i P(U | i) P(i) For the examples given above, P(U) = P(H) = 0.5, even though P(i) P(i)

Coin Thumbtack

15


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.01

0.07

0.13

0.19

0.25

0.31

0.37

0.43

0.49

0.55

0.61

0.67

0.73

0.79

0.85

0.91

0.97

Probability of landing heads (theta)

Pri

or

and

Po

ster

ior

Pro

bab

ilit

y (P

(th

eta)

)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.01

0.07

0.13

0.19

0.25

0.31

0.37

0.43

0.49

0.55

0.61

0.67

0.73

0.79

0.85

0.91

0.97

Probability of landing pin up (phi)

Pri

or

and

Po

ster

ior

Pro

bab

ilit

y (P

(ph

i))

The more certain one is about the occurrence of an event, the more reluctant one will be to accept that this event does not occur in the light of evidence to the contrary

Suppose you flip the coin/tack 10 times and find that all 10 times it lands heads/pin up. Denote this event by E. This would suggest that the coin/tack is strongly biased in favor of landing heads/pin up. However, since we are more sure that the coin is fair than is the tack, we may be less willing to accept that the coin is strongly biased than the tack, and more inclined to write off the event as a fluke random event

The resulting posterior probabilities P(i | E) and P(i | E) are plotted in red below. The posterior probabilities for the tack have responded greatly to the new evidence, but those for the coin have shifted only slightly. Also, now P(H | E) = .55, but P(U | E) = .86

Coin Thumbtack

16

Degrees of certainty If we have no information regarding the likelihood of events 1,…,n, then we

should assign them equal (uniform) probabilities in our prior assessment. If we are more certain that some events will occur, we assign them a higher prior

probability than they would receive under the uniform assignment Suppose you attribute a probability of 0 to the event A. This indicates that

you are absolutely certain that A will not occur, and no amount of information will then sway you to believe that A might occur P(A | E) ≈ prior • likelihood = 0, since the prior P(A) = 0

This also implies that if you assign probability 1 to an event, then no evidence will make you believe that A will not occur

As a general rule, you should avoid assigning probabilities of 0 or 1 to events that might be revised in light of subsequent information Cromwell’s Rule :

0 < P(A | H) < 1, unless H logically implies A or Ac

Use small or large probabilities like .001 or .999 instead

17

Complex decisions

So far, the decisions we have studied have been quite simple, usually consisting of a single decision and a single uncertain event

We’ll now show how to solve more complex decision problems, by repeatedly using the principles we have developed for simple problems

The basic tool we will use to structure complex problems is the decision tree

The major new principle used in solving complex decisions is the idea of backward induction

18

Example: Hatton Realty

Imagine you are a real estate agent in Vancouver. One day, a new client comes to you and says:

“I currently own several properties in the West Side of Vancouver that I would like to sell. I would like to list three of my properties in Dunbar, Kitsilano, and Point Grey through your agency, but on the following conditions:

1) You sell the Dunbar property for $20,000, the Kitsilano property for $40,000, and the Point Grey property for $80,000

2) You will receive a 5% commission on each sale3) You must sell the Dunbar property first, and within a month4) If you sell the Dunbar property, then you may either sell the Kitsilano or Point Grey

property next; otherwise, you will lose the right to sell these properties5) If you sell this second property within a month, then you may list the third property;

otherwise, you lose the right to sell the third property” In considering this proposition, you assess the promotional of listing these

properties, as well as the probability of selling them to be:Property Listing Cost Prob. of SaleDunbar $800 .6Kitsilano $200 .7Point Grey $400 .5

19

Structuring complex decision problems

In the Hatton Realty problem, several events and decisions are made in sequence

The first step in analyzing a more complex decision such as this is to map the chronology of decisions and events in a timeline. Trace every possible progression of decisions and events that may occur What is the first decision that needs to be made? For each action that you might take at this point, what uncertain events that

may impact future decisions or outcomes follow? What are their probabilities of occurrence?

For every possible event realization, what subsequent decisions or events can take place? Etc.

It is usually helpful to think of there being a sequence of decision points, between each of which may occur one or more uncertain events

Once you come to the end of each possible chain of decisions and events, you need to enter the total value of reaching that point.

20

Decision trees

A useful way to represent complex decisions is through a decision tree Every decision point is represented by a square node

in the tree. Every “branch” leading out of such a node represents a possible decision

Each uncertain event is represented by a round node, and every “branch” leading out of such a node represents a possible realization of the event. These branches are labeled with the probabilities of each of these realizations

The “root” of the tree corresponds to the first decision that must be taken

The “leaves” of the tree represent final outcomes

21

The Hatton Realty Decision Problem

Refuse to List Dunbar

$0

List Dunbar

-$800

Don’t Sell Dunbar

Sell Dunbar

.6

.4

$200

Refuse next property

List Kits

List Pt. Grey

Sell PG

Don't Sell

-$200

$3,800

Refuse Kits

Accept Kits

.5

.5

.7

.3

Sell

Don't Sell

$3,600

$5,600

$0Don't Sell

.5

.5

Sell

Don't Sell

$1,600

$5,600

Refuse Pt Grey

Sell Kits

Accept Pt Grey

.7

.3 $2,000

22

Backward induction

Once we have structured the decision as a decision problem, how do we determine the correct actions to take at each decision node?

We use the following node replacement method to reduce the size and complexity of the tree: Replace any terminal event node

with the leaf

Replace any terminal decision node

with the leaf

Replacing these nodes, we work backwards from the end of the tree, computing expected utilities and decision values as we go, until we replace the entire tree with a single leaf

In replacing the decision nodes, we need to record which decisions achieve the maximum – these are the decisions we want to take if we ever reach this node

p

1-p

a

bpa+(1-p)b

a

bmax(a,b)

23

Solving Hatton Realty


$0

List Dunbar

-$800

Don’t Sell Dunbar

Sell Dunbar

.6

.4

$200


List Kits

List Pt. Grey

Sell PG

Don't Sell

-$200

$3,800

Refuse Kits

Accept Kits

.5

.5

.7

.3

Sell

Don't Sell

$3,600

$5,600

$0Don't Sell

.5

.5

Sell

Don't Sell

$1,600

$5,600

Refuse Pt Grey

Sell Kits

Accept Pt Grey

.7

.3 $2,000

24



$0

List Dunbar

-$800

Don’t Sell Dunbar

Sell Dunbar

.6

.4

$200


List Kits

List Pt. Grey

Sell PG

Don't Sell

-$200

$3,800

Refuse Kits

Accept Kits

.5

.5

$5,000

$0Don't Sell

Refuse Pt Grey

Sell Kits

Accept Pt Grey

.7

.3 $2,000

$3,600

25



$0

List Dunbar

-$800

Don’t Sell Dunbar

Sell Dunbar

.6

.4

$200


List Kits

List Pt. Grey Sell PG

Don't Sell

-$200

.5

.5

$0Don't Sell

Sell Kits

.7

.3

$3,600

$5,000

26



$0

List Dunbar

-$800

Don’t Sell Dunbar

Sell Dunbar

.6

.4

$200


List Kits

List Pt. Grey

$2,400

$2,500

27



$0

List Dunbar

-$800

Don’t Sell Dunbar

Sell Dunbar

.6

.4

$2,520

28



$0

List Dunbar

$1,192

29

The value of the proposition is $1192 You will choose the following policy

List Dunbar If Dunbar Sells, List Kits; If Kits sells list Pt Grey

Note that we used the expected monetary values as outcomes In this problem, none of the probabilities assigned to each event

depended on prior decisions We assumed that the probabilities of each sale were independent of the

order in which the properties went on the market This will not always be the case – sometimes the probabilities of events

will depend on previous decisions and previous events The probability assigned to any event realization should be the

probability of that event conditioned on all decisions and events that preceded it up to that point

Solving the realty problem

30

Payoff distribution under the optimal policy

Amount Probability

-800 .4

0 .18

1600 .21

5600 .21

Mean Payoff ? Standard deviation of payoff?

31

Newsvendor Problem

Items cost c, you sell them for p and if they can’t be sold you receive a scrap value s

For every unit sell you make a profit of G = p-c For every unit you order and do not sell you incur a loss

L=c-s Demand D is unknown and either discrete with

distribution P(D=d) or continuous with density f(d) How many should you order to maximize expected

profit? Handout to be distributed.

32

Using Information

33

Value of Information

Suppose in the Hatton Realty case you could do some market research before accepting the offer.

What would it be worth to know in advance whether or not the Dunbar property would sell? Or how much would you pay a clairvoyant to get this

information?

Let’s simplify the Hatton Realty decision tree to see how to take the availability of this information into account.

Assume for now the only option available is to list the Dunbar property by itself.

34

Revised realty problem – Dunbar only


$0

List Dunbar

-$800

Don’t Sell Dunbar

Sell Dunbar

.6

.4

$200

35

Solving revised problem

Do not List Dunbar

$0

List Dunbar

-$200

$0

36

Revised realty problem – Dunbar only; but with perfect information

Dunbarwon’t sell

Dunbar will sell

$200List Dunbar

Don’t list Dunbar

.4

.6 $0

$-800List Dunbar

Don’t list

Dunbar$0

37

Revised realty problem – Dunbar only; but with perfect information

Dunbarwon’t sell

Dunbar will sell

$200List Dunbar

Don’t list Dunbar

.4

.6 $0

-$800List Dunbar

Don’t list

Dunbar$0

$0

$200

$120

38

Selling Dunbar and the Value of Perfect Information

If we knew in advance that Dunbar would sell we would list it and make $200

If we knew in advance that Dunbar would not sell, we would not list it and earn $0. (saving $ 800).

Thus the expected value under this information would be $120. If we didn’t know in the outcome of this chance event we would not list

Dunbar and have an expected value of $0. Thus knowing in advance whether or not Dunbar will sell is worth $120. This is called the expected value of perfect information (EVPI).

In general the EVPI is the difference in expected values between the situation when the uncertain event is resolved and the expected value without this information (the base case).

Usually our base case is the no information case. The EVPI is the most we would pay for any information about the event be it a survey,

market research or …. Exercise; What would is the EVPI for the event Point Grey Sells? Kitsilano

Sells? They both sell? … What is the EVPI for the newsvendor problem?

39

Expected value of perfect information

The difference in the expected utility value between a decision made under uncertainty and the same decision made with the outcomes of the uncertainties known prior to the decision is the expected value of perfect information (EVPI)

It is not difficult to see that by removing the uncertainties from a decision problem, the value of that decision problem is increased, so that the EVPI is always positive: Under normal conditions when the outcomes of the uncertain

event are not known, the value of the decision is DU = maxi j Cij P(j)

When the uncertainties are removed prior to choosing di, the value of the decision becomes

DPI = j [maxi Cij] P(j) Since the terms in the second sum always exceed the first sum,

we can see that DPI ≥ DU

40

Suppose someone has available some information X which is relevant to a decision problem you are facing. Suppose X could be take on one of several values X1, X2, …, XK

You have a prior P(X1), …, P(XK) for each of these possible messages

The expected payoff that results from having this partial information is

k maxi j Cij P(j | Xk) P(Xk) Exercise; Show that acquiring any additional (even

imperfect) information can never reduce the value of a decision problem

Partial information

41

Example: Balls and UrnsRaiffa - 1970 A room contains 1000 urns; 800 Type 1 and 200 Type 2 Type 1 urns contain 4 red and 6 white balls Type 2 urns contain 9 red and 1 white ball Decisions;

A1 - Pick and urn and say type 1, A2 - Pick and urn and say type 2 A3 - Do not play.

Payoffs A1 - if correct win $40, if wrong lose $20 A2 – if correct; win $100, if wrong lose $5 A3 - $0

For $8 you can draw and observe a ball from the urn before guessing What should you do?

42

Balls and urns

Draw a decision tree for problem without sampling.

The expected value of A1= $28, A2 = 16 and A3 =0

So best to guess Type I. Expected value under PI = .8•40+.2•100 = $52 So EVPI = $52- $28 = $24 which is the most we

would pay for any information

43

Draw a ball sub tree (without probabilities)

White

Red

Guess Type 1

92

Guess Type 2

Do not playType 1

Type 1

Type 2

Type 2

Guess Type 1

Guess Type 2

Type 1

Type 1

Type 2

Type 2

Do not play

-8

-8

32

-28

-13

92

32

-28

-13

45

Draw a ball sub tree (with probabilities)

White

Red

Guess Type 1

92

Guess Type 2

Do not playType 1

Type 1

Type 2

Type 2

Guess Type 1

Guess Type 2

Type 1

Type 1

Type 2

Type 2

Do not play

-8

-8

32

-28

-13

92

32

-28

-13

.5

.5

.64

.64

.36

.36

.96

.96

.04

.04

46

Draw a ball sub tree (with decisions)

White

Red

Guess Type 1

92

Guess Type 2

Do not playType 1

Type 1

Type 2

Type 2

Guess Type 1

Guess Type 2

Type 1

Type 1

Type 2

Type 2

Do not play

-8

-8

32

-28

-13

92

32

-28

-13

.5

.5

.64

.64

.36

.36

.96

.96

.04

.04

10.40

24.80

29.60

-8.80

24.80

29.60

27.20

47

Analysis

The optimal strategy is draw a ball; if red, guess type 2, if white, guess type 1. The expected value of this strategy is $27.20 (or $0.80 less than no information case.

So the optimal strategy for the combined problem is to guess urn 1. What is the most you would pay for drawing a ball?

Suppose instead you could draw 2 balls for $12? Suppose for $9 you could draw one ball, look at it and

decide whether or not to draw a second ball (with your without replacement) for $4.50. These last two problems will be part of HW #1.

48

Example: determining a sample size

Suppose you are testing the effectiveness of a new drug. You will administer the drug to a group of patients to test their reaction to it

If the company decides to market the drug and it is better than existing treatments, then your company will make profits of $1M. If it is not better, then the company will lose $1M. If the company decides not to market the drug, it will make $0

The company is risk-neutral Existing treatments have a cure rate of 60%. You

think that either the new drug will be as effective as the existing treatment, or it will cure 80% of patients

Your prior probability for the drug being better than existing treatments is 0.5

It costs $5000 per experimental patient. How many patients should you test?

49


The more people you test, the more sure you will be of the effectiveness of the drug

In selecting a sample size, you are effectively choosing the amount of information you will receive

The more information you have, the greater the value of the end decision: do you market the drug or not?

If you choose a sample size of N, there is a gain in the expected value of the information in the sample, which is increasing in N, as well as a cost of $5000 N The goal in choosing a sample size is to maximize the “profit” of

experimenting, i.e., the expected value of partial information (EVpI), minus the cost of experimenting

While the cost grows linearly, the expected value of partial information increases up to a finite limit – the expected value of perfect information, which equals $500,000 in this case

Therefore, there will be a sample size that will be too large in the sense that the profit of experimenting will be negative

50

0 20 40 60 80 100 1200

1

2

3

4

5

6x 10

5

Sample Size

Dol

lar

Val

ue

EVpICost


The computations for this problem are straightforward but tedious, and can be easily done on a computer

The graph at right shows the EVpI and the cost as a function of the sample size

EVpI – Cost is maximized at a sample size of 22

expected value of perfect information

max profit

1 BAMS 517 Decision Analysis – II Acquiring and Using Information Martin L. Puterman UBC Sauder...

Documents

Transcript of 1 BAMS 517 Decision Analysis – II Acquiring and Using Information Martin L. Puterman UBC Sauder...