Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann,...
-
Upload
nathan-powers -
Category
Documents
-
view
218 -
download
0
description
Transcript of Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann,...
![Page 1: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/1.jpg)
Probability Theory
Longin Jan LateckiTemple University
Slides based on slides by Aaron Hertzmann, Michael P. Frank, and
Christopher Bishop
![Page 2: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/2.jpg)
What is reasoning?• How do we infer properties of
the world?• How should computers do it?
![Page 3: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/3.jpg)
Aristotelian logic• If A is true, then B is true• A is true• Therefore, B is true
A: My car was stolenB: My car isn’t where I left it
![Page 4: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/4.jpg)
Real-world is uncertainProblems with pure logic:• Don’t have perfect information• Don’t really know the model• Model is non-deterministic
So let’s build a logic of uncertainty!
![Page 5: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/5.jpg)
BeliefsLet B(A) = “belief A is true”
B(¬A) = “belief A is false”
e.g., A = “my car was stolen” B(A) = “belief my car was
stolen”
![Page 6: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/6.jpg)
Reasoning with beliefsCox Axioms [Cox 1946]1. Ordering exists
– e.g., B(A) > B(B) > B(C)2. Negation function exists
– B(¬A) = f(B(A))3. Product function exists
– B(A Y) = g(B(A|Y),B(Y))This is all we need!
![Page 7: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/7.jpg)
The Cox Axioms uniquely define a complete system of reasoning: This is probability
theory!
![Page 8: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/8.jpg)
“Probability theory is nothing more than common sense reduced to calculation.”
- Pierre-Simon Laplace, 1814
Principle #1:
![Page 9: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/9.jpg)
Definitions
P(A) = “probability A is true” = B(A) = “belief A is true”
P(A) 2 [0…1]P(A) = 1 iff “A is true”P(A) = 0 iff “A is false”
P(A|B) = “prob. of A if we knew B”P(A, B) = “prob. A and B”
![Page 10: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/10.jpg)
ExamplesA: “my car was stolen”B: “I can’t find my car”
P(A) = .1P(A) = .5
P(B | A) = .99P(A | B) = .3
![Page 11: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/11.jpg)
Sum rule:
P(A) + P(¬A) = 1
Basic rules
Example:A: “it will rain today”
p(A) = .9 p(¬A) = .1
![Page 12: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/12.jpg)
Sum rule:
i P(Ai) = 1
Basic rules
when exactly one of Ai must be true
![Page 13: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/13.jpg)
Product rule:
P(A,B) = P(A|B) P(B) = P(B|A) P(A)
Basic rules
![Page 14: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/14.jpg)
Basic rulesConditioning
i P(Ai) = 1 i P(Ai|B) = 1
P(A,B) = P(A|B) P(B)
P(A,B|C) = P(A|B,C) P(B|C)
Sum Rule
Product Rule
![Page 15: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/15.jpg)
SummaryProduct ruleSum rule
All derivable from Cox axioms; must obey rules of common sense
Now we can derive new rules
P(A,B) = P(A|B) P(B)i P(Ai) = 1
![Page 16: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/16.jpg)
ExampleA = you eat a good meal tonightB = you go to a highly-recommended
restaurant¬B = you go to an unknown restaurant
Model: P(B) = .7, P(A|B) = .8, P(A|¬B) = .5
What is P(A)?
![Page 17: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/17.jpg)
Example, continuedModel: P(B) = .7, P(A|B) = .8, P(A|¬B)
= .5
1 = P(B) + P(¬B)1 = P(B|A) + P(¬B|A)P(A) = P(B|A)P(A) + P(¬B|A)P(A) = P(A,B) + P(A,¬B) = P(A|B)P(B) + P(A|¬B)P(¬B) = .8 .7 + .5 (1-.7) = .71
Sum ruleConditioning
Product rule
Product rule
![Page 18: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/18.jpg)
Basic rulesMarginalizing
P(A) = i P(A, Bi)for mutually-exclusive Bi
e.g., p(A) = p(A,B) + p(A, ¬B)
![Page 19: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/19.jpg)
Given a complete model, we can derive any other
probability
Principle #2:
![Page 20: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/20.jpg)
Model: P(B) = .7, P(A|B) = .8, P(A|¬B) = .5
If we know A, what is P(B|A)?(“Inference”)
Inference
P(A,B) = P(A|B) P(B) = P(B|A) P(A)
P(B|A) =P(A|B) P(B)
P(A) = .8 .7 / .71 ≈ .79
Bayes’ Rule
![Page 21: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/21.jpg)
InferenceBayes Rule
P(M|D) =P(D|M) P(M)
P(D)
Posterior
Likelihood Prior
![Page 22: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/22.jpg)
Describe your model of the world, and then compute the
probabilities of the unknowns given the
observations
Principle #3:
![Page 23: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/23.jpg)
Use Bayes’ Rule to infer unknown model variables from observed
data
Principle #3a:
P(M|D) =P(D|M) P(M)
P(D)
LikelihoodPrior
Posterior
![Page 24: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/24.jpg)
Bayes’ Theorem
posterior likelihood × prior
Rev. Thomas Bayes1702-1761
![Page 25: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/25.jpg)
![Page 26: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/26.jpg)
![Page 27: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/27.jpg)
![Page 28: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/28.jpg)
ExampleSuppose a red die and a blue die are rolled. The sample space:
1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x
Are the events sum is 7 and the blue die is 3 independent?
![Page 29: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/29.jpg)
|S| = 36 1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x
|sum is 7| = 6
|blue die is 3| = 6
| in intersection | = 1
p(sum is 7 and blue die is 3) =1/36
p(sum is 7) p(blue die is 3) =6/36*6/36=1/36
Thus, p((sum is 7) and (blue die is 3)) = p(sum is 7) p(blue die is 3)
The events sum is 7 and the blue die is 3 are independent:
![Page 30: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/30.jpg)
Conditional Probability• Let E,F be any events such that Pr(F)>0.• Then, the conditional probability of E given F,
written Pr(E|F), is defined as Pr(E|F) :≡ Pr(EF)/Pr(F).
• This is what our probability that E would turn out to occur should be, if we are given only the information that F occurs.
• If E and F are independent then Pr(E|F) = Pr(E).Pr(E|F) = Pr(EF)/Pr(F) = Pr(E)×Pr(F)/Pr(F) = Pr(E)
![Page 31: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/31.jpg)
Visualizing Conditional Probability
• If we are given that event F occurs, then– Our attention gets restricted to the
subspace F.• Our posterior probability for E (after seeing
F) correspondsto the fraction of F where Eoccurs also.
• Thus, p′(E)=p(E∩F)/p(F).
Entire sample space S
Event FEvent EEventE∩F
![Page 32: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/32.jpg)
• Suppose I choose a single letter out of the 26-letter English alphabet, totally at random.– Use the Laplacian assumption on the sample space
{a,b,..,z}.– What is the (prior) probability
that the letter is a vowel?• Pr[Vowel] = __ / __ .
• Now, suppose I tell you that the letter chosen happened to be in the first 9 letters of the alphabet.– Now, what is the conditional
(or posterior) probability that the letter is a vowel, given this information?
• Pr[Vowel | First9] = ___ / ___ .
Conditional Probability Example
a b cde f
ghij
k
lmn
op q
r
s
tu
v
w
xy
z
1st 9lettersvowels
Sample Space S
![Page 33: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/33.jpg)
Example
• What is the probability that, if we flip a coin three times, that we get an odd number of tails (=event E), if we know that the event F, the first flip comes up tails occurs?
(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)
Each outcome has probability 1/4,p(E |F) = 1/4+1/4 = ½, where E=odd number of tailsor p(E|F) = p(EF)/p(F) = 2/4 = ½For comparison p(E) = 4/8 = ½ E and F are independent, since p(E |F) = Pr(E).
![Page 34: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/34.jpg)
Example: Two boxes with balls
• Two boxes: first: 2 blue and 7 red balls; second: 4 blue and 3 red balls
• Bob selects a ball by first choosing one of the two boxes, and then one ball from this box.
• If Bob has selected a red ball, what is the probability that he selected a ball from the first box.
• An event E: Bob has chosen a red ball.• An event F: Bob has chosen a ball from the first box.• We want to find p(F | E)
![Page 35: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/35.jpg)
What’s behind door number three?
• The Monty Hall problem paradox– Consider a game show where a prize (a car) is
behind one of three doors– The other two doors do not have prizes (goats
instead)– After picking one of the doors, the host (Monty
Hall) opens a different door to show you that the door he opened is not the prize
– Do you change your decision?• Your initial probability to win (i.e. pick the
right door) is 1/3• What is your chance of winning if you change
your choice after Monty opens a wrong door?• After Monty opens a wrong door, if you change
your choice, your chance of winning is 2/3– Thus, your chance of winning doubles if you
change– Huh?
![Page 36: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/36.jpg)
![Page 37: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/37.jpg)
Monty Hall Problem
Ci - The car is behind Door i, for i equal to 1, 2 or 3.
Hij - The host opens Door j after the player has picked Door i, for i and j equal to 1, 2 or 3. Without loss of generality, assume, by re-numbering the doors if necessary, that the player picks Door 1, and that the host then opens Door 3, revealing a goat. In other words, the host makes proposition H13 true.
Then the posterior probability of winning by not switching doorsis P(C1|H13).
31)( iCP
![Page 38: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/38.jpg)
The probability of winning by switching is P(C2|H13), since under our assumption switching means switching the selection to Door 2, since P(C3|H13) = 0 (the host will never open the door with the car)
3
113
2213
13
2213132
)()|(
)()|()(
)()|()|(
iii CPCHP
CPCHPHP
CPCHPHCP
32
2131
310
311
31
21
311
The posterior probability of winning by not switching doorsis P(C1|H13) = 1/3.
![Page 39: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/39.jpg)
Discrete random variables
Probabilities over discrete variables
C 2 { Heads, Tails }
P(C=Heads) = .5
P(C=Heads) + P(C=Tails) = 1
Possible values (outcomes) are Possible values (outcomes) are discretediscrete
E.g., natural number (0, 1, 2, 3 etc.)E.g., natural number (0, 1, 2, 3 etc.)
![Page 40: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/40.jpg)
Terminology• A (stochastic) experiment is a procedure that yields
one of a given set of possible outcomes• The sample space S of the experiment is the set of
possible outcomes.• An event is a subset of sample space.• A random variable is a function that assigns a real
value to each outcome of an experiment
Normally, a probability is related to an experiment or a trial.
Let’s take flipping a coin for example, what are the possible outcomes?Heads or tails (front or back side) of the coin will be shown upwards.After a sufficient number of tossing, we can “statistically” concludethat the probability of head is 0.5.In rolling a dice, there are 6 outcomes. Suppose we want to calculate the prob. of the event of odd numbers of a dice. What is that probability?
![Page 41: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/41.jpg)
Random Variables
• A “random variable” V is any variable whose value is unknown, or whose value depends on the precise situation.– E.g., the number of students in class today– Whether it will rain tonight (Boolean
variable)• The proposition V=vi may have an uncertain
truth value, and may be assigned a probability.
![Page 42: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/42.jpg)
Example• A fair coin is flipped 3 times. Let S be the sample
space of 8 possible outcomes, and let X be a random variable that assignees to an outcome the number of heads in this outcome.
• Random variable X is a function X:S → X(S), where X(S)={0, 1, 2, 3} is the range of X, which is the number of heads, andS={ (TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH) }
• X(TTT) = 0 X(TTH) = X(HTT) = X(THT) = 1X(HHT) = X(THH) = X(HTH) = 2X(HHH) = 3
• The probability distribution (pdf) of random variable X is given by P(X=3) = 1/8, P(X=2) = 3/8, P(X=1) = 3/8, P(X=0) = 1/8.
![Page 43: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/43.jpg)
Experiments & Sample Spaces• A (stochastic) experiment is any process by
which a given random variable V gets assigned some particular value, and where this value is not necessarily known in advance.– We call it the “actual” value of the variable,
as determined by that particular experiment.
• The sample space S of the experiment is justthe domain of the random variable, S = dom[V].
• The outcome of the experiment is the specific value vi of the random variable that is selected.
![Page 44: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/44.jpg)
Events• An event E is any set of possible outcomes in S…
– That is, E S = dom[V].•E.g., the event that “less than 50 people show
up for our next class” is represented as the set {1, 2, …, 49} of values of the variable V = (# of people here next class).
• We say that event E occurs when the actual value of V is in E, which may be written VE.– Note that VE denotes the proposition (of
uncertain truth) asserting that the actual outcome (value of V) will be one of the outcomes in the set E.
![Page 45: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/45.jpg)
Probability of an event E
The probability of an event E is the sum of the probabilities of the outcomes in E. That is
Note that, if there are n outcomes in the event E, that is, if E = {a1,a2,…,an} then
p(E) p(s)
sE
p(E) p(
i1
n
ai)
![Page 46: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/46.jpg)
Example
• What is the probability that, if we flip a coin three times, that we get an odd number of tails?
(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)
Each outcome has probability 1/8,p(odd number of tails) =
1/8+1/8+1/8+1/8 = ½
![Page 47: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/47.jpg)
SS
TailTail
HHHHTTTT
THTHHTHT
Sample SpaceSample SpaceS = {HH, HT, TH, TT}S = {HH, HT, TH, TT}
Venn Diagram
OutcomeOutcome
Experiment: Toss 2 Coins. Note Faces.Experiment: Toss 2 Coins. Note Faces.
Event Event
![Page 48: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/48.jpg)
Discrete Probability Distribution ( also called probability mass
function (pmf) )1.List of All possible [x, p(x)] pairs
– x = Value of Random Variable (Outcome)– p(x) = Probability Associated with Value
2.Mutually Exclusive (No Overlap)3.Collectively Exhaustive (Nothing Left Out)4. 0 p(x) 15. p(x) = 1
![Page 49: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/49.jpg)
Visualizing Discrete Probability Distributions
{ (0, .25), (1, .50), (2, .25) }
ListingListingTableTable
GraphGraph EquationEquation
# # TailsTails f(xf(x))CountCount
p(xp(x))
00 11 .25.2511 22 .50.5022 11 .25.25
pp xx nnxx nn xx
pp ppxx nn xx(( )) !!!! (( )) !!
(( ))
11
.00.00
.25.25
.50.50
00 11 22xx
p(x)p(x)
![Page 50: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/50.jpg)
• Marginal Probability
• Conditional ProbabilityJoint Probability
N is the total number of trials andnij is the number of instances where X=xi and Y=yj
![Page 51: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/51.jpg)
• Sum Rule
Product Rule
![Page 52: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/52.jpg)
The Rules of Probability
• Sum Rule
• Product Rule
![Page 53: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/53.jpg)
Continuous variablesProbability Distribution Function
(PDF)a.k.a. “marginal probability”
p(x) P(a · x · b) = sab p(x) dx
x
Notation: P(x) is probp(x) is PDF
![Page 54: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/54.jpg)
Continuous variablesProbability Distribution Function (PDF)
Let x 2 Rp(x) can be any function s.t. s-1
1 p(x) dx = 1p(x) ¸ 0
Define P(a · x · b) = sab p(x) dx
![Page 55: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/55.jpg)
Continuous Prob. Density Function
1.Mathematical Formula
2.Shows All Values, x, and Frequencies, f(x)– f(x) Is Not Probability
3.Properties
((Area Under Curve)Area Under Curve)ValueValue
((Value, Frequency)Value, Frequency)
f(x)f(x)
aa bbxxff xx dxdx
ff xx
(( ))
(( ))
All All xx
aa x x bb
11
0,0,
![Page 56: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/56.jpg)
Continuous Random Variable Probability
Probability Is Area Probability Is Area Under Curve!Under Curve!
PP cc xx dd ff xx dxdxccdd
(( )) (( ))
f(x)f(x)
Xc d
![Page 57: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/57.jpg)
Probability mass functionIn probability theory, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. A pmf differs from a probability density function (pdf) in that the values of a pdf, defined only for continuous random variables, are not probabilities as such. Instead, the integral of a pdf over a range of possible values (a, b] gives the probability of the random variable falling within that range.
Example graphs of a pmfs. All the values of a pmf must be non-negative and sum up to 1. (right) The pmf of a fair die. (All the numbers on the die have an equal chance of appearing on top when the die is rolled.)
![Page 58: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/58.jpg)
Suppose that X is a discrete random variable, taking values on some countable sample space S ⊆ R. Then the probability mass function fX(x) for X is given by
Note that this explicitly defines fX(x) for all real numbers, including all values in R that X could never take; indeed, it assigns such values a probability of zero.
Example. Suppose that X is the outcome of a single coin toss, assigning 0 to tails and 1 to heads. The probability that X = x is 0.5 on the state space {0, 1} (this is a Bernoulli random variable), and hence the probability mass function is
![Page 59: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/59.jpg)
Uniform Distribution1. Equally Likely Outcomes
2. Probability Density
3. Mean & Standard Deviation Mean Mean MedianMedian
f xd c
( ) 1
c d d c
2 12
1d c
x
f(x)
dc
![Page 60: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/60.jpg)
Uniform Distribution Example
• You’re production manager of a soft drink bottling company. You believe that when a machine is set to dispense 12 oz., it really dispenses 11.5 to 12.5 oz. inclusive.
• Suppose the amount dispensed has a uniform distribution.
• What is the probability that less than 11.8 oz. is dispensed?
![Page 61: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/61.jpg)
Uniform Distribution Solution
P(11.5 x 11.8) = (Base)(Height) = (11.8 - 11.5)(1) = 0.30
11.511.5 12.512.5
ff((xx))
xx11.811.8
1 112 5 11511
10
d c
. .
.
1.01.0
![Page 62: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/62.jpg)
Normal Distribution
1. Describes Many Random Processes or Continuous Phenomena
2. Can Be Used to Approximate Discrete Probability Distributions
– Example: Binomial
3. Basis for Classical Statistical Inference
4. A.k.a. Gaussian distribution
![Page 63: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/63.jpg)
Normal Distribution1. ‘Bell-Shaped’ &
Symmetrical
2.Mean, Median, Mode Are Equal
4. Random Variable Has Infinite Range MeanMean
X
f(X)
* light-tailed distribution
![Page 64: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/64.jpg)
Probability Density Function
2
21
e2
1)(
x
xf
f(x) = Frequency of Random Variable x = Population Standard Deviation = 3.14159; e = 2.71828x = Value of Random Variable (-< x < ) = Population Mean
![Page 65: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/65.jpg)
![Page 66: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/66.jpg)
Effect of Varying Parameters ( & )
X
f(X)
CA
B
![Page 67: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/67.jpg)
Normal Distribution Probability
?)()( dxxfdxcPd
c
c d x
f(x)
Probability is Probability is area under area under curve!curve!
![Page 68: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/68.jpg)
X
f(X)
Infinite Number of Tables
Normal distributions differ by Normal distributions differ by mean & standard deviation.mean & standard deviation.
Each distribution would Each distribution would require its own table.require its own table.
That’s an That’s an infinite infinite number!number!
![Page 69: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/69.jpg)
Standardize theNormal Distribution
X
One table!
Normal Distribution
= 0
= 1
Z
Z X
Standardized
Normal Distribution
![Page 70: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/70.jpg)
Intuitions on Standardizing• Subtracting from each value X just
moves the curve around, so values are centered on 0 instead of on
• Once the curve is centered, dividing each value by >1 moves all values toward 0, pressing the curve
![Page 71: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/71.jpg)
Standardizing Example
X= 5
= 10
6.2
Normal Distribution
Z X
6 2 510
12. .
![Page 72: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/72.jpg)
Standardizing Example
X= 5
= 10
6.2
Normal Distribution
Z X
6 2 510
12. .
Z= 0
= 1
.12
Standardized Normal Distribution
![Page 73: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/73.jpg)
Why use Gaussians?• Convenient analytic properties• Central Limit Theorem• Works well• Not for everything, but a good
building block• For more reasons, see
[Bishop 1995, Jaynes 2003]
![Page 74: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/74.jpg)
Rules for continuous PDFs
Same intuitions and rules apply
“Sum rule”: s-11 p(x) dx = 1
Product rule: p(x,y) = p(x|y)p(x)Marginalizing: p(x) = s p(x,y)dy
… Bayes’ Rule, conditioning, etc.
![Page 75: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop.](https://reader033.fdocuments.in/reader033/viewer/2022052917/5a4d1b847f8b9ab0599bc26b/html5/thumbnails/75.jpg)
Multivariate distributions
Uniform: x » U(dom) Gaussian: x » N(, )