6.2 Probability Theory Longin Jan Latecki Temple University

93
dule #19 – Probability 6.2 Probability Theory Longin Jan Latecki Temple University Slides for a Course Based on the Slides for a Course Based on the Text Text Discrete Mathematics & Its Applications Discrete Mathematics & Its Applications (6 (6 th th Edition) Kenneth H. Rosen Edition) Kenneth H. Rosen based on slides by based on slides by Michael P. Frank and Andrew W. Michael P. Frank and Andrew W. Moore Moore

description

6.2 Probability Theory Longin Jan Latecki Temple University. Slides for a Course Based on the Text Discrete Mathematics & Its Applications (6 th Edition) Kenneth H. Rosen based on slides by Michael P. Frank and Andrew W. Moore. Terminology. - PowerPoint PPT Presentation

Transcript of 6.2 Probability Theory Longin Jan Latecki Temple University

Page 1: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

6.2 Probability Theory

Longin Jan LateckiTemple University

Slides for a Course Based on the TextSlides for a Course Based on the TextDiscrete Mathematics & Its ApplicationsDiscrete Mathematics & Its Applications (6 (6thth

Edition) Kenneth H. Rosen Edition) Kenneth H. Rosen based on slides by based on slides by

Michael P. Frank and Andrew W. MooreMichael P. Frank and Andrew W. Moore

Page 2: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Terminology• A (stochastic) A (stochastic) experimentexperiment is a procedure that is a procedure that

yields one of a given set of possible outcomesyields one of a given set of possible outcomes• The The sample spacesample space SS of the experiment is the of the experiment is the

set of possible outcomes.set of possible outcomes.• An event is a An event is a subsetsubset of sample space. of sample space.• A random variable is a function that assigns a A random variable is a function that assigns a

real value to each outcome of an experimentreal value to each outcome of an experimentNormally, a probability is related to an experiment or a trial.

Let’s take flipping a coin for example, what are the possible outcomes?

Heads or tails (front or back side) of the coin will be shown upwards.After a sufficient number of tossing, we can “statistically” concludethat the probability of head is 0.5.In rolling a dice, there are 6 outcomes. Suppose we want to calculate the prob. of the event of odd numbers of a dice. What is that probability?

Page 3: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Random Variables

• A A “random variable”“random variable” VV is any variable whose is any variable whose value is unknown, or whose value depends on value is unknown, or whose value depends on the precise situation.the precise situation.– E.g.E.g., the number of students in class today, the number of students in class today– Whether it will rain tonight (Boolean variable)Whether it will rain tonight (Boolean variable)

• The proposition The proposition VV==vvii may have an uncertain may have an uncertain

truth value, and may be assigned a truth value, and may be assigned a probability.probability.

Page 4: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – ProbabilityExample 10

• A fair coin is flipped 3 times. Let A fair coin is flipped 3 times. Let SS be the sample space of 8 be the sample space of 8 possible outcomes, and let possible outcomes, and let XX be a random variable that assignees be a random variable that assignees to an outcome the number of heads in this outcome. to an outcome the number of heads in this outcome.

• Random variable Random variable X X is a functionis a function XX::SS → → XX((SS), ), where where XX((SS)={0, 1, 2, 3} is the range of )={0, 1, 2, 3} is the range of X, X, which is the number the number of heads, andof heads, andSS={ ={ (TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH) } }

• X(TTT) = 0 X(TTT) = 0 X(TTH) = X(HTT) = X(THT) = 1X(TTH) = X(HTT) = X(THT) = 1X(HHT) = X(THH) = X(HTH) = 2X(HHT) = X(THH) = X(HTH) = 2X(HHH) = 3X(HHH) = 3

• The The probability distribution (pdf) of random variableprobability distribution (pdf) of random variable XX is given by is given by P(X=3) = 1/8, P(X=2) = 3/8, P(X=1) = 3/8, P(X=0) = 1/8. P(X=3) = 1/8, P(X=2) = 3/8, P(X=1) = 3/8, P(X=0) = 1/8.

Page 5: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Experiments & Sample Spaces

• A (stochastic) A (stochastic) experimentexperiment is any process by which is any process by which a given random variable a given random variable VV gets assigned some gets assigned some particularparticular value, and where this value is not value, and where this value is not necessarily known in advance.necessarily known in advance.– We call it the “actual” value of the variable, as We call it the “actual” value of the variable, as

determined by that particular experiment.determined by that particular experiment.• The The sample spacesample space SS of the experiment is just of the experiment is just

the domain of the random variable, the domain of the random variable, SS = dom[ = dom[VV]]..• The The outcomeoutcome of the experiment is the specific of the experiment is the specific

value value vvii of the random variable that is selected. of the random variable that is selected.

Page 6: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Events

• An An eventevent EE is any set of possible outcomes in is any set of possible outcomes in SS……– That is, That is, EE SS = = domdom[[VV]]..

• E.g., the event that “less than 50 people show up for our next E.g., the event that “less than 50 people show up for our next class” is represented as the set class” is represented as the set {1, 2, …, 49}{1, 2, …, 49} of values of the of values of the variable variable VV = (# of people here next class) = (# of people here next class)..

• We say that event We say that event EE occursoccurs when the actual value when the actual value of of VV is in is in EE, which may be written , which may be written VVEE..– Note that Note that VVEE denotes the proposition (of uncertain denotes the proposition (of uncertain

truth) asserting that the actual outcome (value of truth) asserting that the actual outcome (value of VV) will ) will be one of the outcomes in the set be one of the outcomes in the set EE..

Page 7: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probabilities

• We write P(A) as “the fraction of possible We write P(A) as “the fraction of possible worlds in which A is true”worlds in which A is true”

• We could at this point spend 2 hours on the We could at this point spend 2 hours on the philosophy of this.philosophy of this.

• But we won’t.But we won’t.

Page 8: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Visualizing A

Event space of all possible worlds

Its area is 1Worlds in which A is False

Worlds in which A is true

P(A) = Area ofreddish oval

Page 9: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability

• The The probabilityprobability p p = Pr[= Pr[EE] ] [0,1] [0,1] of an event of an event EE is is a real number representing our degree of certainty a real number representing our degree of certainty that that EE will occur. will occur.– If If Pr[Pr[EE] = 1] = 1, then , then EE is absolutely certain to occur, is absolutely certain to occur,

• thus thus VVEE has the truth value has the truth value TrueTrue..– If If Pr[Pr[EE] = 0] = 0, then , then EE is absolutely certain is absolutely certain not not to occur,to occur,

• thus thus VVEE has the truth value has the truth value FalseFalse..– If If Pr[Pr[EE] = ] = ½½, then we are , then we are maximally uncertainmaximally uncertain about about

whether whether EE will occur; that is, will occur; that is, • VVEE and and VVEE are considered are considered equally likelyequally likely..

– How do we interpret other values of How do we interpret other values of pp??Note: We could also define probabilities for more general propositions, as well as events.

Page 10: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Four Definitions of Probability

• Several alternative definitions of probability Several alternative definitions of probability are commonly encountered:are commonly encountered:– Frequentist, Bayesian, Laplacian, AxiomaticFrequentist, Bayesian, Laplacian, Axiomatic

• They have different strengths & They have different strengths & weaknesses, philosophically speaking.weaknesses, philosophically speaking.– But fortunately, they coincide with each other But fortunately, they coincide with each other

and work well together, in the majority of cases and work well together, in the majority of cases that are typically encountered.that are typically encountered.

Page 11: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability: Frequentist Definition

• The probability of an event The probability of an event EE is the limit, as is the limit, as nn→∞→∞,, of of the fraction of times that we find the fraction of times that we find VVEE over the course over the course of of nn independent repetitions of (different instances of) independent repetitions of (different instances of) the same experiment.the same experiment.

• Some problems with this definition: Some problems with this definition: – It is only well-defined for experiments that can be It is only well-defined for experiments that can be

independently repeated, infinitely many times! independently repeated, infinitely many times! • or at least, if the experiment can be repeated in principle, or at least, if the experiment can be repeated in principle, e.g.e.g., over , over

some hypothetical ensemble of (say) alternate universes.some hypothetical ensemble of (say) alternate universes.

– It can never be measured exactly in finite time!It can never be measured exactly in finite time!

• Advantage:Advantage: It’s an objective, mathematical definition. It’s an objective, mathematical definition.

n

nE EV

n

lim:]Pr[

Page 12: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability: Bayesian Definition• Suppose a rational, profit-maximizing entity Suppose a rational, profit-maximizing entity RR is is

offered a choice between two rewards:offered a choice between two rewards:– Winning Winning $1$1 if and only if the event if and only if the event EE actually occurs. actually occurs.– Receiving Receiving pp dollars (where dollars (where pp[0,1][0,1]) unconditionally.) unconditionally.

• If If RR can honestly state that he is completely can honestly state that he is completely indifferent between these two rewards, then we say indifferent between these two rewards, then we say that that RR’s probability for ’s probability for EE is is pp, that is, , that is, PrPrRR[[EE] :] :≡≡ pp..

• Problem:Problem: It’s a subjective definition; depends on the It’s a subjective definition; depends on the reasoner reasoner RR, and his knowledge, beliefs, & rationality., and his knowledge, beliefs, & rationality.– The version above additionally assumes that the utility of The version above additionally assumes that the utility of

money is linear.money is linear.• This assumption can be avoided by using “utils” (utility units) This assumption can be avoided by using “utils” (utility units)

instead of dollars.instead of dollars.

Page 13: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability: Laplacian Definition

• First, assume that all individual outcomes in the First, assume that all individual outcomes in the sample space are sample space are equally likelyequally likely to each other to each other……– Note that this term still needs an operational definition!Note that this term still needs an operational definition!

• Then, the probability of any event Then, the probability of any event EE is given by, is given by, Pr[Pr[EE] = |] = |EE|/||/|SS||. . Very simple!Very simple!

• Problems:Problems: Still needs a definition for Still needs a definition for equally likelyequally likely, , and depends on the existence of and depends on the existence of somesome finite sample finite sample space space SS in which all outcomes in in which all outcomes in SS are, in fact, are, in fact, equally likely.equally likely.

Page 14: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability: Axiomatic Definition• Let Let pp be any total function be any total function pp::SS→[0,1]→[0,1] such that such that

∑∑ss pp((ss) = 1) = 1..

• Such a Such a pp is called a is called a probability distributionprobability distribution..• Then, theThen, the probability under p probability under p

of any event of any event EESS is just: is just:• Advantage: Advantage: Totally mathematically well-defined!Totally mathematically well-defined!

– This definition can even be extended to apply to infinite This definition can even be extended to apply to infinite sample spaces, by changing sample spaces, by changing ∑∑→→∫∫, and calling , and calling pp a a probability density functionprobability density function or a probability or a probability measuremeasure..

• Problem: Problem: Leaves operational meaning unspecified.Leaves operational meaning unspecified.

Es

spEp )(:][

Page 15: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

The Axioms of Probability

• 0 <= P(A) <= 10 <= P(A) <= 1

• P(True) = 1P(True) = 1

• P(False) = 0P(False) = 0

• P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)

Page 16: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Interpreting the axioms• 0 <= P(A) 0 <= P(A) <= 1<= 1• P(True) = 1P(True) = 1• P(False) = 0P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)

The area of A can’t get any smaller than 0

And a zero area would mean no world could ever have A true

Page 17: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Interpreting the axioms• 0 <= 0 <= P(A) <= 1P(A) <= 1• P(True) = 1P(True) = 1• P(False) = 0P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)

The area of A can’t get any bigger than 1

And an area of 1 would mean all worlds will have A true

Page 18: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Interpreting the axioms

• 0 <= P(A) <= 10 <= P(A) <= 1• P(True) = 1P(True) = 1• P(False) = 0P(False) = 0• P(P(AA or or BB) = P() = P(AA) + P() + P(BB) - P() - P(AA and and BB))

A

B

Page 19: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

These Axioms are Not to be Trifled With

• There have been attempts to do different There have been attempts to do different methodologies for uncertaintymethodologies for uncertainty

• Fuzzy LogicFuzzy Logic

• Three-valued logicThree-valued logic

• Dempster-ShaferDempster-Shafer

• Non-monotonic reasoningNon-monotonic reasoning

• But the axioms of probability are the only system But the axioms of probability are the only system with this property: with this property:

If you gamble using them you can’t be unfairly exploited by an If you gamble using them you can’t be unfairly exploited by an opponent using some other system [di Finetti 1931]opponent using some other system [di Finetti 1931]

Page 20: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Theorems from the Axioms

• 0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(P(AA or or BB) = P() = P(AA) + P() + P(BB) - P() - P(AA and and BB))

From these we can prove:From these we can prove:

P(not A) = P(~A) = 1-P(A)P(not A) = P(~A) = 1-P(A)

• How?How?

Page 21: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Another important theorem

• 0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(P(AA or or BB) = P() = P(AA) + P() + P(BB) - P() - P(AA and and BB))

From these we can prove:From these we can prove:

P(A) = P(A ^ B) + P(A ^ ~B)P(A) = P(A ^ B) + P(A ^ ~B)

• How?How?

Page 22: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability of an event EThe probability of an event E is the sum of the The probability of an event E is the sum of the

probabilities of the outcomes in E. That isprobabilities of the outcomes in E. That is

Note that, if there are n outcomes in the event E, Note that, if there are n outcomes in the event E, that is, if E = {athat is, if E = {a11,a,a22,…,a,…,ann} then} then

p(E) p(s)

sE

p(E) p(

i1

n

ai)

Page 23: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Example

• What is the probability that, if we flip a coin What is the probability that, if we flip a coin three times, that we get an odd number of three times, that we get an odd number of tails?tails?

((TTTTTT), (TTH), (), (TTH), (TTHH), (HTT), (HHHH), (HTT), (HHTT), (HHH), ), (HHH), (THT), (H(THT), (HTTH)H)

Each outcome has probability 1/8,Each outcome has probability 1/8,

p(odd number of tails) = 1/8+1/8+1/8+1/8 = ½ p(odd number of tails) = 1/8+1/8+1/8+1/8 = ½

Page 24: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Visualizing Sample Space

• 1.1. ListingListing– S = {Head, Tail}S = {Head, Tail}

• 2.2. Venn Diagram Venn Diagram

• 3.3. Contingency TableContingency Table

• 4.4. Decision Tree DiagramDecision Tree Diagram

Page 25: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

SS

TailTail

HHHH

TTTT

THTHHTHT

Sample SpaceSample SpaceS = {HH, HT, TH, TT}S = {HH, HT, TH, TT}

Venn Diagram

OutcomeOutcome

Experiment: Toss 2 Coins. Note Faces.Experiment: Toss 2 Coins. Note Faces.

Event Event

Page 26: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

22ndnd CoinCoin11stst CoinCoin HeadHead TailTail TotalTotal

HeadHead HHHH HTHT HH, HTHH, HT

TailTail THTH TTTT TH, TTTH, TT

TotalTotal HH,HH, THTH HT,HT, TTTT SS

Contingency Table

Experiment: Toss 2 Coins. Note Faces.Experiment: Toss 2 Coins. Note Faces.

S = {HH, HT, TH, TT}S = {HH, HT, TH, TT} Sample SpaceSample Space

OutcomeOutcomeSimpleSimpleEvent Event (Head on(Head on1st Coin)1st Coin)

Page 27: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Tree Diagram

Outcome Outcome

S = {HH, HT, TH, TT}S = {HH, HT, TH, TT} Sample SpaceSample Space

Experiment: Toss 2 Coins. Note Faces.Experiment: Toss 2 Coins. Note Faces.

TT

HH

TT

HH

TT

HHHH

HTHT

THTH

TTTT

HH

Page 28: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Discrete Random Variable

– Possible values (outcomes) are discretePossible values (outcomes) are discrete• E.g., natural number (0, 1, 2, 3 etc.)E.g., natural number (0, 1, 2, 3 etc.)

– Obtained by CountingObtained by Counting– Usually Finite Number of ValuesUsually Finite Number of Values

• But could be infinite (must be “countable”)But could be infinite (must be “countable”)

Page 29: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Discrete Probability Distribution ( also called probability mass function (pmf) )

1.1.List of All possible [List of All possible [xx, , pp((xx)] pairs)] pairs– xx = Value of Random Variable (Outcome) = Value of Random Variable (Outcome)– pp((xx) = Probability Associated with Value) = Probability Associated with Value

2.2.Mutually Exclusive (No Overlap)Mutually Exclusive (No Overlap)

3.3.Collectively Exhaustive (Nothing Left Out)Collectively Exhaustive (Nothing Left Out)

4. 0 4. 0 pp((xx) ) 1 1

5. 5. pp((xx) = 1) = 1

Page 30: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Visualizing Discrete Probability Distributions

{ (0, .25), (1, .50), (2, .25) }{ (0, .25), (1, .50), (2, .25) }{ (0, .25), (1, .50), (2, .25) }{ (0, .25), (1, .50), (2, .25) }

ListingListingTableTable

GraphGraph EquationEquation

# # TailsTails f(xf(x))CountCount

p(xp(x))

00 11 .25.2511 22 .50.5022 11 .25.25

pp xxnn

xx nn xxpp ppxx nn xx(( ))

!!

!! (( )) !!(( ))

11

.00.00

.25.25

.50.50

00 11 22xx

p(x)p(x)

Page 31: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Arity of Random Variables

• Suppose A can take on more than 2 valuesSuppose A can take on more than 2 values• A is a A is a random variable with arity krandom variable with arity k if it can take on exactly one value if it can take on exactly one value

out of out of {v{v11,v,v22, .. v, .. vkk}}

• Thus…Thus…

jivAvAP ji if 0)(

1)( 21 kvAvAvAP

Page 32: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Mutually Exclusive Events

• Two events Two events EE11, , EE22 are called are called mutually mutually

exclusiveexclusive if they are disjoint: if they are disjoint: EE11EE22 = = – Note that two mutually exclusive events Note that two mutually exclusive events cannot cannot

both occurboth occur in the same instance of a given in the same instance of a given experiment.experiment.

• For mutually exclusive events,For mutually exclusive events,Pr[Pr[EE11 EE22] = Pr[] = Pr[EE11] + Pr[] + Pr[EE22]]..

Page 33: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Exhaustive Sets of Events

• A set A set EE = { = {EE11, , EE22, …}, …} of events in the sample of events in the sample

space space SS is called is called exhaustiveexhaustive iff . iff .• An exhaustive set An exhaustive set EE of events that are all mutually of events that are all mutually

exclusive with each other has the property thatexclusive with each other has the property that

SEi

.1]Pr[ iE

Page 34: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

An easy fact about Multivalued Random Variables:

• Using the axioms of probability…Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0

P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)• And assuming that A obeys…And assuming that A obeys…

• It’s easy to prove thatjivAvAP ji if 0)(

1)( 21 kvAvAvAP

)()(1

21

i

jji vAPvAvAvAP

• And thus we can prove

1)(1

k

jjvAP

Page 35: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Another fact about Multivalued Random Variables:

• Using the axioms of probability…Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0

P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)• And assuming that A obeys…And assuming that A obeys…

• It’s easy to prove that

jivAvAP ji if 0)(

1)( 21 kvAvAvAP

)(])[(1

21

i

jji vABPvAvAvABP

Page 36: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Elementary Probability Rules

• P(~A) + P(A) = 1P(~A) + P(A) = 1• P(B) = P(B ^ A) + P(B ^ ~A)P(B) = P(B ^ A) + P(B ^ ~A)

1)(1

k

jjvAP

)()(1

k

jjvABPBP

Page 37: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Bernoulli Trials

• Each performance of an experiment with Each performance of an experiment with only two possible outcomes is called a only two possible outcomes is called a Bernoulli trialBernoulli trial..

• In general, a possible outcome of a In general, a possible outcome of a Bernoulli trial is called a Bernoulli trial is called a successsuccess or a or a failurefailure..

• If p is the probability of a success and q is If p is the probability of a success and q is the probability of a failure, then p+q=1.the probability of a failure, then p+q=1.

Page 38: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

ExampleA coin is biased so that the probability of heads is 2/3. A coin is biased so that the probability of heads is 2/3.

What is the probability that exactly four heads come up What is the probability that exactly four heads come up when the coin is flipped seven times, assuming that the when the coin is flipped seven times, assuming that the flips are independent?flips are independent?

The number of ways that we can get four heads is: C(7,4) The number of ways that we can get four heads is: C(7,4) = 7!/4!3!= 7*5 = 35= 7!/4!3!= 7*5 = 35

The probability of getting four heads and three tails is The probability of getting four heads and three tails is (2/3)(2/3)44(1/3)(1/3)3= 3= 16/316/377

p(4 heads and 3 tails) is C(7,4) p(4 heads and 3 tails) is C(7,4) (2/3)(2/3)44(1/3)(1/3)33 = 35*16/3 = 35*16/377 = = 560/2187560/2187

Page 39: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability of k successes in n independent Bernoulli trials.

The probability of k successes in n The probability of k successes in n independent Bernoulli trials, with independent Bernoulli trials, with probability of success p and probability of probability of success p and probability of failure q = 1-p is C(n,k)pfailure q = 1-p is C(n,k)pkkqqn-kn-k

Page 40: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Find each of the following probabilities when n independent Bernoulli trials are carried out

with probability of success, p.

• Probability of no successes.Probability of no successes.

C(n,0)pC(n,0)p00qqn-k n-k = 1(p= 1(p00)(1-p))(1-p)n n = (1-p)= (1-p)n n

• Probability of at least one success.Probability of at least one success.

1 - (1-p)1 - (1-p)n n (why?)(why?)

Page 41: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Find each of the following probabilities when n independent Bernoulli trials are carried out

with probability of success, p.

• Probability of at most one success.Probability of at most one success.

Means there can be no successes or one Means there can be no successes or one successsuccess

C(n,0)pC(n,0)p00qqn-0 n-0 +C(n,1)p+C(n,1)p11qqn-1 n-1

(1-p)(1-p)nn + np(1-p) + np(1-p)n-1n-1

• Probability of at least two successes.Probability of at least two successes.

1 -1 - (1-p)(1-p)nn - np(1-p) - np(1-p)n-1n-1

Page 42: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

A coin is flipped until it comes ups tails. The probability the coin comes up tails is p.

• What is the probability that the experiment What is the probability that the experiment ends after n flips, that is, the outcome ends after n flips, that is, the outcome consists of n-1 heads and a tail?consists of n-1 heads and a tail?

(1-p)(1-p)n-1n-1pp

Page 43: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability vs. Odds• You may have heard the term “odds.”You may have heard the term “odds.”

– It is widely used in the gambling community.It is widely used in the gambling community.

• This is not the same thing as probability!This is not the same thing as probability!– But, it is very closely related.But, it is very closely related.

• The The odds in favorodds in favor of an event of an event EE means the means the relative relative probability of probability of EE compared with its complement compared with its complement EE. .

OO((EE) :) :≡ Pr(≡ Pr(EE)/Pr()/Pr(EE))..– E.g.E.g., if , if pp((EE) = 0.6) = 0.6 then then pp((EE) = 0.4) = 0.4 and and OO((EE) = 0.6/0.4 = 1.5) = 0.6/0.4 = 1.5..

• Odds are conventionally written as a ratio of integers.Odds are conventionally written as a ratio of integers.– E.g.E.g., , 3/23/2 or or 3:23:2 in above example. “Three to two in favor.” in above example. “Three to two in favor.”

• The The odds againstodds against EE just means just means 1/1/OO((EE)). . “2 to 3 against”“2 to 3 against”

Exercise:Express theprobabilityp as a functionof the odds in favor O.

Page 44: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Example 1: Balls-and-Urn• Suppose an urn contains 4 blue balls and 5 red balls.Suppose an urn contains 4 blue balls and 5 red balls.• An example An example experiment: experiment: Shake up the urn, reach in Shake up the urn, reach in

(without looking) and pull out a ball.(without looking) and pull out a ball.• A A random variablerandom variable VV: Identity of the chosen ball.: Identity of the chosen ball.• The The sample space sample space SS: The set of: The set of

all possible values of all possible values of VV::– In this case, In this case, SS = { = {bb11,…,,…,bb99}}

• An An eventevent EE: “The ball chosen is: “The ball chosen isblue”: blue”: EE = { ______________ } = { ______________ }

• What are the odds in favor of What are the odds in favor of EE??• What is the probability of What is the probability of EE? ?

b1 b2

b3

b4

b5

b6

b7

b8

b9

Page 45: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Independent Events

• Two events E,F are called independent if Pr[EF] = Pr[E]·Pr[F].

• Relates to the product rule for the number of ways of doing two independent tasks.

• Example: Flip a coin, and roll a die.Pr[(coin shows heads) (die shows 1)] =

Pr[coin is heads] × Pr[die is 1] = ½×1/6 =1/12.

Page 46: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Example

Suppose a red die and a blue die are rolled. The sample space:

1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x

Are the events sum is 7 and the blue die is 3 independent?

Page 47: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

|S| = 36 1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x

|sum is 7| = 6

|blue die is 3| = 6

| in intersection | = 1

p(sum is 7 and blue die is 3) =1/36

p(sum is 7) p(blue die is 3) =6/36*6/36=1/36

Thus, p((sum is 7) and (blue die is 3)) = p(sum is 7) p(blue die is 3)

The events sum is 7 and the blue die is 3 are independent:

Page 48: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Conditional Probability

• Let Let EE,,FF be any events such that be any events such that Pr[Pr[FF]>0]>0..• Then, the Then, the conditional probabilityconditional probability of of EE given given FF, ,

written written Pr[Pr[EE||FF]], is defined as , is defined as Pr[Pr[EE||FF] :] :≡ ≡ Pr[Pr[EEFF]/Pr[]/Pr[FF]]..

• This is what our probability that This is what our probability that EE would turn out would turn out to occur should be, if we are given to occur should be, if we are given only only the the information that information that FF occurs. occurs.

• If If EE and and FF are independent then are independent then Pr[Pr[EE||FF] = Pr[] = Pr[EE]].. Pr[Pr[EE||FF] = Pr[] = Pr[EEFF]/Pr[]/Pr[FF] = Pr[] = Pr[EE]]×Pr[×Pr[FF]/Pr[]/Pr[FF] = Pr[] = Pr[EE]]

Page 49: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Visualizing Conditional Probability

• If we are given that event If we are given that event FF occurs, then occurs, then– Our attention gets restricted to the subspace Our attention gets restricted to the subspace FF..

• Our Our posteriorposterior probability for probability for EE (after seeing (after seeing FF)) correspondscorrespondsto the to the fractionfraction of of FF where where EEoccurs also.occurs also.

• Thus, Thus, pp′(′(EE)=)=pp((EE∩∩FF)/)/pp((FF).).

Entire sample space S

Event FEvent EEventE∩F

Page 50: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

• Suppose I choose a single letter out of the 26-letter English alphabet, totally at random.– Use the Laplacian assumption on the sample space {a,b,..,z}.– What is the (prior) probability

that the letter is a vowel?• Pr[Vowel] = __ / __ .

• Now, suppose I tell you that the letter chosen happened to be in the first 9 letters of the alphabet.– Now, what is the conditional (or

posterior) probability that the letteris a vowel, given this information?

• Pr[Vowel | First9] = ___ / ___ .

Conditional Probability Example

a b c

de fghi

j

k

l

mn

o

p q

r

s

tu

v

w

xy

z

1st 9lettersvowels

Sample Space S

Page 51: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Example

• What is the probability that, if we flip a coin three times, that we get an odd number of tails (=event E), if we know that the event F, the first flip comes up tails occurs?

(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)

Each outcome has probability 1/4,

p(E |F) = 1/4+1/4 = ½, where E=odd number of tailsor p(E|F) = p(EF)/p(F) = 2/4 = ½For comparison p(E) = 4/8 = ½ E and F are independent, since p(E |F) = Pr(E).

Page 52: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Prior and Posterior Probability• Suppose that, before you are given any information about the

outcome of an experiment, your personal probability for an event E to occur is p(E) = Pr[E].– The probability of E in your original probability distribution p is called

the prior probability of E.• This is its probability prior to obtaining any information about the outcome.

• Now, suppose someone tells you that some event F (which may overlap with E) actually occurred in the experiment.– Then, you should update your personal probability for event E to occur,

to become p′(E) = Pr[E|F] = p(E∩F)/p(F).• The conditional probability of E, given F.

– The probability of E in your new probability distribution p′ is called the posterior probability of E.

• This is its probability after learning that event F occurred.

• After seeing F, the posterior distribution p′ is defined by letting

p′(v) = p({v}∩F)/p(F) for each individual outcome vS.

Page 53: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

6.3 Bayes’ Theorem

Longin Jan LateckiTemple University

Slides for a Course Based on the TextSlides for a Course Based on the TextDiscrete Mathematics & Its ApplicationsDiscrete Mathematics & Its Applications (6 (6thth

Edition) Kenneth H. Rosen Edition) Kenneth H. Rosen based on slides by based on slides by

Michael P. Frank and Wolfram BurgardMichael P. Frank and Wolfram Burgard

Page 54: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Bayes’ Rule• One way to compute the probability that a

hypothesis H is correct, given some data D:

• This follows directly from the definition of conditional probability! (Exercise: Prove it.)

• This rule is the foundation of Bayesian methods for probabilistic reasoning, which are very powerful, and widely used in artificial intelligence applications:– For data mining, automated diagnosis, pattern recognition,

statistical modeling, even evaluating scientific hypotheses!

]Pr[

]Pr[]|Pr[]|Pr[

D

HHDDH

Rev. Thomas Bayes1702-1761

Page 55: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Bayes’ Theorem

• Allows one to compute the probability that a Allows one to compute the probability that a hypothesis hypothesis HH is correct, given data is correct, given data DD::

]Pr[

]Pr[]|Pr[]|Pr[

D

HHDDH

jjj

iii HHD

HHDDH

]Pr[]|Pr[

]Pr[]|Pr[]|Pr[

Set of Hj is exhaustive

Page 56: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Example 1: Two boxes with balls• Two boxes: first: 2 blue and 7 red balls; second: 4 blue and 3 red ballsTwo boxes: first: 2 blue and 7 red balls; second: 4 blue and 3 red balls• Bob selects a ball by first choosing one of the two boxes, and then one ball Bob selects a ball by first choosing one of the two boxes, and then one ball

from this box.from this box.• If Bob has selected a red ball, what is the probability that he selected a ball If Bob has selected a red ball, what is the probability that he selected a ball

from the first box.from the first box.• An An eventevent EE: Bob has chosen a red ball.: Bob has chosen a red ball.• An An eventevent FF: Bob has chosen a ball from the first box.: Bob has chosen a ball from the first box.• We want to find p(We want to find p(FF | | EE))

Page 57: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Example 2:• Suppose 1% of population has AIDS• Prob. that the positive result is right: 95%• Prob. that the negative result is right: 90%• What is the probability that someone who has the

positive result is actually an AIDS patient?

• H: event that a person has AIDS• D: event of positive result• P[D|H] = 0.95 = 0.95 P[D| H] = 1- 0.9

• P[D] = P[D|H]P[H]+P[D|H]P[H ] = 0.95*0.01+0.1*0.99=0.1085

• P[H|D] = 0.95*0.01/0.1085=0.0876

Page 58: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

What’s behind door number three?• The Monty Hall problem paradoxThe Monty Hall problem paradox

– Consider a game show where a prize (a car) is behind one of Consider a game show where a prize (a car) is behind one of three doorsthree doors

– The other two doors do not have prizes (goats instead)The other two doors do not have prizes (goats instead)– After picking one of the doors, the host (Monty Hall) opens a After picking one of the doors, the host (Monty Hall) opens a

different door to show you that the door he opened is not the different door to show you that the door he opened is not the prizeprize

– Do you change your decision?Do you change your decision?• Your initial probability to win (i.e. pick the right door) is 1/3Your initial probability to win (i.e. pick the right door) is 1/3• What is your chance of winning if you change your choice What is your chance of winning if you change your choice

after Monty opens a wrong door?after Monty opens a wrong door?• After Monty opens a wrong door, if you change your choice, After Monty opens a wrong door, if you change your choice,

your chance of your chance of winningwinning is 2/3 is 2/3– Thus, your chance of winning Thus, your chance of winning doublesdoubles if you change if you change– Huh?Huh?

Page 59: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Page 60: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Monty Hall Problem

Ci - The car is behind Door i, for i equal to 1, 2 or 3.

Hij - The host opens Door j after the player has picked Door i, for i and j equal to 1, 2 or 3. Without loss of generality, assume, by re-numbering the doors if necessary, that the player picks Door 1, and that the host then opens Door 3, revealing a goat. In other words, the host makes proposition H13 true.

Then the posterior probability of winning by not switching doorsis P(C1|H13).

3

1)( iCP

Page 61: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

3

113

1113

13

1113131

)()|(

)()|(

)(

)()|()|(

iii CPCHP

CPCHP

HP

CPCHPHCP

P(H13 | C1 )= 0.5, since the host will always open a door that has no car behind it, chosen from among the two not picked by the player (which are 2 and 3 here)

)()|()()|()()|(

)()|(

331322131113

1113

CPCHPCPCHPCPCHP

CPCHP

3

1

2161

31

031

131

21

31

21

Page 62: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – ProbabilityThe probability of winning by switching is P(C2|H13), since under our assumption switching means switching the selection to Door 2, since P(C3|H13) = 0 (the host will never open the door with the car)

3

113

2213

13

2213132

)()|(

)()|(

)(

)()|()|(

iii CPCHP

CPCHP

HP

CPCHPHCP

3

2

2131

31

031

131

21

31

1

The posterior probability of winning by not switching doorsis P(C1|H13) = 1/3.

Page 63: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Exercises: 6, p. 424, and 16, p. 425

Page 64: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Page 65: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Page 66: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Page 67: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Continuous random variable

Page 68: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Continuous Prob. Density Function

1.1. Mathematical FormulaMathematical Formula

2.2. Shows All Values, Shows All Values, xx, and , and Frequencies, f(Frequencies, f(xx))– f(f(xx) Is ) Is NotNot Probability Probability

3.3. PropertiesProperties

((Area Under Curve)Area Under Curve)ValueValue

((Value, Frequency)Value, Frequency)

f(x)f(x)

aa bbxxff xx dxdx

ff xx

(( ))

(( ))

All All xx

aa x x bb

11

0,0,

Page 69: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Continuous Random Variable Probability

Probability Is Area Probability Is Area Under Curve!Under Curve!

PP cc xx dd ff xx dxdxcc

dd(( )) (( ))

f(x)f(x)

Xc d

Page 70: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability mass function

In probability theory, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. A pmf differs from a probability density function (pdf) in that the values of a pdf, defined only for continuous random variables, are not probabilities as such. Instead, the integral of a pdf over a range of possible values (a, b] gives the probability of the random variable falling within that range.

Example graphs of a pmfs. All the values of a pmf must be non-negative and sum up to 1. (right) The pmf of a fair die. (All the numbers on the die have an equal chance of appearing on top when the die is rolled.)

Page 71: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – ProbabilitySuppose that X is a discrete random variable, taking values on some countable sample space  S ⊆ R. Then the probability mass function  fX(x)  for X is given by

thus

Note that this explicitly defines  fX(x)  for all real numbers,

including all values in R that X could never take; indeed, it assigns such values a probability of zero.

Example. Suppose that X is the outcome of a single coin toss, assigning 0 to tails and 1 to heads. The probability that X = x is 0.5 on the state space {0, 1} (this is a Bernoulli random variable), and hence the probability mass function is

)()( xfxXP X

Page 72: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Uniform Distribution

1. 1. Equally Likely OutcomesEqually Likely Outcomes

2. Probability Density2. Probability Density

3. Mean & Standard Deviation 3. Mean & Standard Deviation Mean Mean

MedianMedian

f xd c

( ) 1

f xd c

( ) 1

c d d c

2 12

c d d c2 12

1d c

1d c

x

f(x)

dc

Page 73: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Uniform Distribution Example

• You’re production manager of a soft drink You’re production manager of a soft drink bottling company. You believe that when a bottling company. You believe that when a machine is set to dispense 12 oz., it really machine is set to dispense 12 oz., it really dispenses dispenses 11.511.5 to to 12.512.5 oz. inclusive. oz. inclusive.

• Suppose the amount dispensed has a uniform Suppose the amount dispensed has a uniform distribution. distribution.

• What is the probability that What is the probability that lessless thanthan 11.811.8 oz. oz. is dispensed?is dispensed?

Page 74: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Uniform Distribution Solution

PP(11.5 (11.5 xx 11.8) 11.8) = (Base)(Height) = (Base)(Height)

= (11.8 - 11.5)(1) = = (11.8 - 11.5)(1) = 0.30 0.30

11.511.5 12.512.5

ff((xx))

xx11.811.8

1 112 5 11511

10

d c

. .

.

1 112 5 11511

10

d c

. .

.

1.01.0

Page 75: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Normal Distribution

1.1. Describes Many Random Processes Describes Many Random Processes or Continuous Phenomenaor Continuous Phenomena

2.2. Can Be Used to Approximate Discrete Can Be Used to Approximate Discrete Probability DistributionsProbability Distributions

– Example: BinomialExample: Binomial

3.3. Basis for Classical Statistical InferenceBasis for Classical Statistical Inference

4.4. A.k.a. Gaussian distributionA.k.a. Gaussian distribution

Page 76: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Normal Distribution

1.1. ‘‘Bell-Shaped’ & Bell-Shaped’ & SymmetricalSymmetrical

2.2. Mean, Median, Mode Mean, Median, Mode Are EqualAre Equal

4. Random Variable Has 4. Random Variable Has Infinite RangeInfinite Range MeanMean

X

f(X)

X

f(X)

* light-tailed distribution

Page 77: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Probability Density Function

2

2

1

e2

1)(

x

xf

2

2

1

e2

1)(

x

xf

ff((xx)) == Frequency of Random Variable Frequency of Random Variable xx == Population Standard DeviationPopulation Standard Deviation == 3.14159; e = 2.718283.14159; e = 2.71828xx == Value of Random Variable (-Value of Random Variable (-< < xx < < )) == Population MeanPopulation Mean

Page 78: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Effect of Varying Parameters ( & )

X

f(X)

CA

B

Page 79: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Normal Distribution Probability

?)()( dxxfdxcPd

c

c dx

f(x)

c dx

f(x)

Probability is Probability is area under area under curve!curve!

Page 80: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

X

f(X)

X

f(X)

Infinite Number of Tables

Normal distributions differ by Normal distributions differ by mean & standard deviation.mean & standard deviation.

Each distribution would Each distribution would require its own table.require its own table.

That’s an That’s an infinite infinite number!number!

Page 81: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Standardize theNormal Distribution

X

X

One table! One table!

Normal DistributionNormal Distribution

= 0

= 1

Z = 0

= 1

Z

ZX

ZX

Standardized

Normal DistributionStandardized

Normal Distribution

Page 82: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Intuitions on Standardizing

• Subtracting Subtracting from each value X just from each value X just moves the curve around, so values are moves the curve around, so values are centered on 0 instead of on centered on 0 instead of on

• Once the curve is centered, dividing Once the curve is centered, dividing each value by each value by >1 moves all values >1 moves all values toward 0, pressing the curvetoward 0, pressing the curve

Page 83: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Standardizing Example

X= 5

= 10

6.2 X= 5

= 10

6.2

Normal DistributionNormal Distribution

ZX

6 2 5

1012

..Z

X

6 2 510

12.

.

Page 84: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Standardizing Example

X= 5

= 10

6.2 X= 5

= 10

6.2

Normal DistributionNormal Distribution

ZX

6 2 5

1012

..Z

X

6 2 510

12.

.

Z= 0

= 1

.12 Z= 0

= 1

.12

Standardized Normal Distribution

Standardized Normal Distribution

Page 85: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Page 86: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

6.4 Expected Value and Variance

Longin Jan LateckiTemple University

Slides for a Course Based on the TextSlides for a Course Based on the TextDiscrete Mathematics & Its ApplicationsDiscrete Mathematics & Its Applications

(6(6thth Edition) Kenneth H. Rosen Edition) Kenneth H. Rosen based on slides by Michael P. Frankbased on slides by Michael P. Frank

Page 87: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Expected Values• For any random variable For any random variable VV having a numeric domain, its having a numeric domain, its

expectation valueexpectation value or or expected valueexpected value or or weighted average weighted average valuevalue or or (arithmetic) mean value(arithmetic) mean value ExEx[[VV]], under the , under the probability distribution probability distribution Pr[Pr[vv] = ] = pp((vv)), is defined as , is defined as

• The term “expected value” is very widely used for this.The term “expected value” is very widely used for this.– But this term is somewhat misleading, since the “expected” But this term is somewhat misleading, since the “expected”

value might itself be totally unexpected, or even impossible!value might itself be totally unexpected, or even impossible!• E.g.E.g., if , if pp(0)=0.5(0)=0.5 & & pp(2)=0.5(2)=0.5, then , then Ex[Ex[VV]=1]=1, even though , even though pp(1)=0(1)=0 and so and so

we know that we know that VV≠1≠1!!• Or, if Or, if pp(0)=0.5(0)=0.5 & & pp(1)=0.5(1)=0.5, then , then Ex[Ex[VV]=0.5]=0.5 even if even if VV is an integer is an integer

variable!variable!

.)(:][:][:ˆ][

Vv

p vpvVVVdom

ExEx

Page 88: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Derived Random Variables

• Let Let SS be a sample space over values of a random be a sample space over values of a random variable variable VV (representing possible outcomes). (representing possible outcomes).

• Then, any function Then, any function ff over over SS can also be considered can also be considered to be a random variable (whose actual value to be a random variable (whose actual value ff((VV)) is is derived from the actual value of derived from the actual value of VV).).

• If the range If the range RR = = rangerange[[ff]] of of ff is numeric, then the is numeric, then the mean value mean value ExEx[[ff]] of of ff can still be defined, as can still be defined, as

Ss

sfspff )()(][ˆ Ex

Page 89: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Recall that a random variable X is actually a functionf: S → X(S),

where S is the sample space and X(S) is the range of X.This fact implies that the expected value of X is

Ss SXr

rrXpsXspXE)(

)()()()(

Example 1. Expected Value of a Die.Let X be the number that comes up when a die is rolled.

}6,...,1{}6,...,1{ 2

7

6

1)()(

rr

rrrXpXE

Page 90: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – ProbabilityExample 2

• A fair coin is flipped 3 times. Let A fair coin is flipped 3 times. Let SS be the be the sample space of 8 possible outcomes, and let sample space of 8 possible outcomes, and let XX be a random variable that assignees to an be a random variable that assignees to an outcome the number of heads in this outcome. outcome the number of heads in this outcome.

E(X) = 1/8[X(TTT) + X(TTH) + X(THH) + E(X) = 1/8[X(TTT) + X(TTH) + X(THH) + X(HTT) + X(HHT) +X(HHH) + X(THT) + X(HTT) + X(HHT) +X(HHH) + X(THT) + X(HTH)] X(HTH)] = 1/8[0 + 1 + 2 + 1 + 2 + 3 + 1 + 2] = 12/8 =3/2= 1/8[0 + 1 + 2 + 1 + 2 + 3 + 1 + 2] = 12/8 =3/2

Page 91: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Linearity of Expectation Values

• Let Let XX11, , XX22 be any two random variables be any two random variables derived from the derived from the samesame sample space sample space SS, and , and subject to the same underlying distribution. subject to the same underlying distribution.

• Then we have the following theorems:Then we have the following theorems:ExEx[[XX11++XX22] = ] = ExEx[[XX11] + ] + ExEx[[XX22]]

ExEx[[aXaX11 + + bb] = ] = aaExEx[[XX11] + ] + bb

• You should be able to easily prove these for You should be able to easily prove these for yourself at home.yourself at home.

Page 92: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – Probability

Variance & Standard Deviation

• The The variancevariance VarVar[[XX] = ] = σσ22((XX)) of a random variable of a random variable XX is the expected value of the is the expected value of the squaresquare of the of the difference between the value of difference between the value of XX and its and its expectation value expectation value ExEx[[XX]]::

• The The standard deviationstandard deviation or or root-mean-squareroot-mean-square (RMS) (RMS) differencedifference of of XX is is σσ((XX) :≡ ) :≡ VarVar[[XX]]1/21/2..

Ss

p spXsXX )(][)(:][ 2ExVar

Page 93: 6.2 Probability Theory Longin Jan Latecki Temple University

Module #19 – ProbabilityExample 15

• What is the variance of the random variable What is the variance of the random variable XX, , where where XX is the number that comes up when a die is the number that comes up when a die is rolled?is rolled?

• V(X) = E(XV(X) = E(X22) – E(X)) – E(X)22

E(XE(X22) = 1/6[1) = 1/6[122 + 2 + 222 + 3 + 322 + 4 + 422 + 5 + 522 + 6 + 622] = 91/6] = 91/6

V(X) = 91/6 – (7/2) V(X) = 91/6 – (7/2) 2 2 = 35/12 = 35/12 ≈≈ 2.92 2.92