PROBABILITY THEORY 1. Basicsee162.caltech.edu/notes/lect1.pdf · Probability theory deals with the...
Transcript of PROBABILITY THEORY 1. Basicsee162.caltech.edu/notes/lect1.pdf · Probability theory deals with the...
1
1. Basics
Probability theory deals with the study of random
phenomena, which under repeated experiments yield
different outcomes that have certain underlying patterns
about them. The notion of an experiment assumes a set of
repeatable conditions that allow any number of identical
repetitions. When an experiment is performed under these
conditions, certain elementary events occur in different
but completely uncertain ways. We can assign nonnegative
number as the probability of the event in various
ways:),( iP ξ
iξ
iξ
PROBABILITY THEORY
2
Laplace’s Classical Definition: The Probability of an event A is defined a-priori without actual experimentation as
provided all these outcomes are equally likely.
Consider a box with n white and m red balls. In this case, there are two elementary outcomes: white ball or red ball. Probability of “selecting a white ball”
We can use above classical definition to determine the probability that a given number is divisible by a prime p.
, outcomes possible ofnumber Total
tofavorable outcomes ofNumber )(
AAP =
.mn
n
+=
(1-1)
3
If p is a prime number, then every pth number (starting with p) is divisible by p. Thus among p consecutive integersthere is one favorable outcome, and hence
Relative Frequency Definition: The probability of an event A is defined as
where nA is the number of occurrences of A and n is the total number of trials.We can use the relative frequency definition to derive (1-2) as well. To do this we argue that among the integers
the numbers are divisible by p.
1 P a given number is divisible by a prime p
p= (1-2)
n
nAP A
n lim)(
∞→= (1-3)
1, 2, 3, , ,n 2, ,p p
4
Thus there are n/p such numbers between 1 and n. Hence
In a similar manner, it follows that
and
The axiomatic approach to probability, due to Kolmogorov,developed through a set of axioms (below) is generally recognized as superior to the above definitions, (1-1) and (1-3), as it provides a solid foundation for complicated applications.
/ 1
.
limn
n pn p
P a given number N is divisible by a prime p
→∞== (1-4)
22
1 P p divides any given number N
p=
(1-6)
(1-5)
1 .P pq divides any given number N
pq=
5
The totality of all known a priori, constitutes a set Ω, the set of all experimental outcomes.
Ω has subsets Recall that if A is a subset of Ω, then implies From A and B, we can generate other related subsets etc.
,iξ
,,,, 21 kξξξ=Ω (1-7)
A∈ξ .Ω∈ξ.,,, CBA
, , , , BABABA ∩∪
(1-8)
and
BABA
BABA
∈∈=∩∈∈=∪ξξξ
ξξξ and |
or |
AA ∉= ξξ |
6
A B
BA ∪
A B A
BA ∩ A
• If the empty set, then A and B are
said to be mutually exclusive (M.E).
• A partition of Ω is a collection of mutually exclusive
subsets of Ω such that their union is Ω.
,φ=∩BA
. and ,1
Ω==∩=∪i
iji AAA φ
BA
φ=∩ BA
1A2A
nA
iA
A
(1-9)
jA
Fig. 1.2
Fig.1.1
7
De-Morgan’s Laws:
BABABABA ∪=∩∩=∪ ;
A B
BA ∪
A B
BA ∪
A B
BA ∩
A B
• Often it is meaningful to talk about at least some of the subsets of Ω as events, for which we must have mechanismto compute their probabilities.
Example 1.1: Consider the experiment where two coins are simultaneously tossed. The various elementary events are
Fig.1.3
(1-10)
8
. ,,, 4321 ξξξξ=Ω
),( ),,( ),,( ),,( 4321 TTHTTHHH ==== ξξξξ
and
The subset is the same as “Head has occurred at least once” and qualifies as an event.
Suppose two subsets A and B are both events, then consider
“Does an outcome belong to A or B ”
“Does an outcome belong to A and B ”
“Does an outcome fall outside A”?
,, 321 ξξξ=A
BA∪=
BA∩=
9
Thus the sets etc., also qualify as events. We shall formalize this using the notion of a Field.
• Field: A collection of subsets of a nonempty set Ω forms a field F if
Using (i) - (iii), it is easy to show that etc.,
also belong to F. For example, from (ii) we have
and using (iii) this gives
applying (ii) again we get where we
have used De Morgan’s theorem in (1-10).
, , , , BABABA ∩∪
. then, and If (iii)
then, If (ii)
(i)
FBAFBFA
FAFA
F
∈∪∈∈∈∈
∈Ω
, , BABA ∩∩
, , FBFA ∈∈ ; FBA ∈∪
, FBABA ∈∩=∪
(1-11)
10
Thus if then
From here on wards, we shall reserve the term ‘event’only for members of F.
Assuming that the probability of elementary outcomes of Ω are apriori defined, how does one assign probabilities to more ‘complicated’ events such as A, B, AB, etc., above?
The three axioms of probability defined below can be used to achieve that goal.
, , FBFA ∈∈
. ,,,,,,,, BABABABABAF ∪∩∪Ω=
)( ii Pp ξ=iξ
(1-12)
11
Axioms of Probability
For any event A, we assign a number P(A), called the probability of the event A. This number satisfies the following three conditions that act as axioms of probability.
(Note that (iii) states that if A and B are mutually exclusive (M.E.) events, the probability of their union is the sum of their probabilities.)
).()()( then, If (iii)
unity) isset whole theofty (Probabili 1)( (ii)
number) enonnegativa isty (Probabili 0)( (i)
BPAPBAPBA
P
AP
+=∪=∩=Ω≥
φ(1-13)
12
The following conclusions follow from these axioms:
a. Since we have using (ii)
But and using (iii),
b. Similarly, for any A,
Hence it follows that
But and thus
c. Suppose A and B are not mutually exclusive (M.E.)
How does one compute
, Ω=∪ AA
.1)() P( =Ω=∪ PAA
, φ∈∩ AA
).(1)or P( 1)P()() P( APAAAPAA −==+=∪ (1-14)
. φφ =∩A
( ) . )()( φφ PAPAP +=∪
, AA =∪ φ .0 =φP (1-15)
?)( =∪ BAP
13
To compute the above probability, we should re-expressin terms of M.E. sets so that we can make use of
the probability axioms. From Fig.1.4 we have
where A and are clearly M.E. events. Thus using axiom (1-13-iii)
To compute we can express B as
Thus
since and are M.E. events.
BA ∪
, BAABA ∪=∪ (1-16)
).()()()( BAPAPBAAPBAP +=∪=∪),( BAP
ABBAABAB
AABBB
∪=∩∪∩=
∪∩=Ω∩=
)()(
)(
),()()( ABPBAPBP +=
ABBA = BAAB =
BA
(1-17)
(1-18)
(1-19)
A BA
BA ∪ Fig.1.4
14
From (1-19),
and using (1-20) in (1-17)
• Question: Suppose every member of a denumerably
infinite collection Ai of pair wise disjoint sets is an
event, then what can we say about their union
i.e., suppose all what about A? Does it
belong to F?
Further, if A also belongs to F, what about P(A)?
)()()( ABPBPBAP −=
).()()()( ABPBPAPBAP −+=∪
?1∪
∞
=
=i
iAA
,FAi ∈
(1-22)
(1-20)
(1-21)
(1-23)
(1-24)
15
The above questions involving infinite sets can only be settled using our intuitive experience from plausible experiments. For example, in a coin tossing experiment, where the same coin is tossed indefinitely, define
A = “head eventually appears”.
Is A an event? Our intuitive experience surely tells us that A is an event. Let
Clearly Moreover the above A is
,,,,,
tossth theon 1st time for the appears head
1
htttt
nA
n
n
−
==
(1-26)
.φ=∩ ji AA
.321 ∪∪∪∪∪= iAAAAA (1-27)
(1-25)
16
We cannot use probability axiom (1-13-iii) to compute P(A), since the axiom only deals with two (or a finite number) of M.E. events.
To settle both questions above (1-23)-(1-24), extension of these notions must be done, based on our intuition, as new axioms.
• σ-Field (Definition):
A field F is a σ-field if in addition to the three conditions in (1-11), we have the following:
For every sequence of pair wise disjoint events belonging to F, their union also belongs to F, i.e.,
,1, ∞→=iAi
.1
FAAi
i ∈=∞
=∪ (1-28)
17
In view of (1-28), we can add yet another axiom to the set of probability axioms in (1-13).
(iv) If Ai are pair wise mutually exclusive, then
Returning back to the coin tossing experiment, from experience we know that if we keep tossing a coin, eventually, a head must show up, i.e.,
But and using the fourth probability axiomin (1-29),
).(11
∑∞
=
∞
=
=⎟⎟⎠
⎞⎜⎜⎝
⎛
nn
nn APAP ∪ (1-29)
.1)( =AP (1-30)
∪∞
=
=1
,n
nAA
).()(11
∑∞
=
∞
=
=⎟⎟⎠
⎞⎜⎜⎝
⎛=n
nn
n APAPAP ∪ (1-31)
18
From (1-26), for a fair coin since only one in 2n outcomes is in favor of An , we have
which agrees with (1-30), thus justifying the
‘reasonableness’ of the fourth axiom in (1-29).
In summary, the triplet (Ω, F, P) composed of a nonempty set Ω of elementary events, a σ-field F of subsets of Ω, and a probability measure P on the sets in F subject the four axioms ((1-13) and (1-29)) form a probability model.
The probability of more complicated events must follow from this framework by deduction.
,12
1)( and
2
1)(
11
=== ∑∑∞
=
∞
= nn
nnnn APAP (1-32)
19
Conditional Probability and Independence
In N independent trials, suppose NA, NB, NAB denote the number of times events A, B and AB occur respectively. According to the frequency interpretation of probability, for large N
Among the NA occurrences of A, only NAB of them are also found among the NB occurrences of B. Thus the ratio
.)( ,)( ,)(N
NABP
N
NBP
N
NAP ABBA ≈≈≈ (1-33)
)(
)(
/
/
BP
ABP
NN
NN
N
N
B
AB
B
AB == (1-34)
20
is a measure of “the event A given that B has already occurred”. We denote this conditional probability by
P(A|B) = Probability of “the event A given
that B has occurred”.
We define
provided As we show below, the above definition
satisfies all probability axioms discussed earlier.
,)(
)()|(
BP
ABPBAP =
.0)( ≠BP
(1-35)
21
We have
(i)
(ii) since Ω B = B.
(iii) Suppose Then
But hence
satisfying all probability axioms in (1-13). Thus (1-35) defines a legitimate probability measure.
,00)(
0)()|( ≥
>≥=
BP
ABPBAP
.)(
)(
)(
))(()|(
BP
CBABP
BP
BCAPBCAP
∪=∩∪=∪
,1)(
)(
)(
)()|( ==Ω=Ω
BP
BP
BP
BPBP
),|()|()(
)(
)(
)()|( BCPBAP
BP
CBP
BP
ABPBCAP +=+=∪
.0=∩ CA
,φ=∩ ACAB ).()()( CBPABPCBABP +=∪
(1-39)
(1-37)
(1-36)
(1-38)
22
Properties of Conditional Probability:
a. If and
since if then occurrence of B implies automatic occurrence of the event A. As an example:
in a dice tossing experiment. Then and
b. If and
, , BABAB =⊂
1)(
)(
)(
)()|( ===
BP
BP
BP
ABPBAP (1-40)
,AB ⊂
).()(
)(
)(
)()|( AP
BP
AP
BP
ABPBAP >==
, , AABBA =⊂
(1-41)
,AB ⊂ .1)|( =BAP
outcome is even, =outcome is 2,A B=
23
(In a dice experiment,
so that The statement that B has occurred (outcome
is even) makes the odds for “outcome is 2” greater than
without that information).
c. We can use the conditional probability to express the
probability of a complicated event in terms of “simpler”
related events.
Let are pair wise disjoint and their union is Ω.
Thus and
Thus
.BA⊂
.1
Ω==∪
n
iiA
nAAA ,,, 21
,φ=ji AA
.)( 2121 nn BABABAAAABB ∪∪∪=∪∪∪=
(1-42)
(1-43)
outcome is 2, =outcome is even,A B=
24
But so that from (1-43)
With the notion of conditional probability, next we
introduce the notion of “independence” of events.
Independence: A and B are said to be independent events,
if
Notice that the above definition is a probabilistic statement,
not a set theoretic notion such as mutually exclusiveness.
).()()( BPAPABP ⋅= (1-45)
,φφ =∩⇒=∩ jiji BABAAA
∑∑==
==n
iii
n
ii APABPBAPBP
11
).()|()()( (1-44)
25
Suppose A and B are independent, then
Thus if A and B are independent, the event that B has occurred does not shed any more light into the event A. It makes no difference to A whether B has occurred or not. An example will clarify the situation:
Example 1.2: A box contains 6 white and 4 black balls. Remove two balls at random without replacement. What is the probability that the first one is white and the second one is black?
Let W1 = “first ball removed is white”
B2 = “second ball removed is black”
).()(
)()(
)(
)()|( AP
BP
BPAP
BP
ABPBAP === (1-46)
26
We need We have Using the conditional probability rule,
But
and
and hence
?)( 21 =∩ BWP
).()|()()( 1121221 WPWBPWBPBWP ==
,53
106
466
)( 1 ==+
=WP
,94
454
)|( 12 =+
=WBP
1 2
3 4 12( ) 0.27.
5 9 45P W B = ⋅ = ≈
.122121 WBBWBW ==∩
(1-47)
27
Are the events W1 and B2 independent? Our common sense says No. To verify this we need to compute P(B2). Of course the fate of the second ball very much depends on that of the first ball. The first ball has two options: W1 = “first ball is white” or B1= “first ball is black”. Note that and Hence W1 together with B1 form a partition. Thus (see (1-42)-(1-44))
and
As expected, the events W1 and B2 are dependent.
,11 φ=∩ BW
.11 Ω=∪ BW
2 2 1 1 2 1 1( ) ( | ) ( ) ( | ) ( )
4 3 3 4 4 3 1 2 4 2 2 ,
5 4 5 6 3 10 9 5 3 5 15 5
P B P B W P W P B B P B= ++= ⋅ + ⋅ = ⋅ + ⋅ = =
+ +
2 1 2 1
2 3 12( ) ( ) ( ) .
5 5 45P B P W P B W= ⋅ ≠ =
28
From (1-35),
Similarly, from (1-35)
or
From (1-48)-(1-49), we get
or
Equation (1-50) is known as Bayes’ theorem.
).()|()( BPBAPABP =
,)(
)(
)(
)()|(
AP
ABP
AP
BAPABP ==
).()|()( APABPABP =
(1-48)
(1-49)
).()|()()|( APABPBPBAP =
(1-50))()(
)|()|( AP
BP
ABPBAP ⋅=
29
Although simple enough, Bayes’ theorem has an interesting interpretation: P(A) represents the a-priori probability of the event A. Suppose B has occurred, and assume that A and Bare not independent. How can this new information be used to update our knowledge about A? Bayes’ rule in (1-50) take into account the new information (“B has occurred”) and gives out the a-posteriori probability of A given B.
We can also view the event B as new knowledge obtained from a fresh experiment. We know something about A as P(A). The new information is available in terms of B. Thenew information should be used to improve our knowledge/understanding of A. Bayes’ theorem gives the exact mechanism for incorporating such new information.
30
A more general version of Bayes’ theorem involves
partition of Ω. From (1-50)
where we have made use of (1-44). In (1-51),
represent a set of mutually exclusive events with
associated a-priori probabilities With the
new information “B has occurred”, the information about
Ai can be updated by the n conditional probabilities
,)()|(
)()|(
)(
)()|()|(
1∑
=
== n
iii
iiiii
APABP
APABP
BP
APABPBAP (1-51)
,1 , niAi →=
47).-(1 using ,1 ),|( niABP i →=
.1 ),( niAP i →=
31
Example 1.3: Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box (B1) has 15 defective bulbs and the second 5. Suppose a box is selected at random and one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”.
Then
.025.0200
5)|( ,15.0
100
15)|( 21 ==== BDPBDP
32
Since a box is selected at random, they are equally likely.
Thus B1 and B2 form a partition as in (1-43), and using
(1-44) we obtain
Thus, there is about 9% probability that a bulb picked at random is defective.
.2
1)()( 21 == BPBP
.0875.021
025.021
15.0
)()|()()|()( 2211
=×+×=
+= BPBDPBPBDPDP
33
(b) Suppose we test the bulb and it is found to be defective.
What is the probability that it came from box 1?
Notice that initially then we picked out a box
at random and tested a bulb that turned out to be defective.
Can this information shed some light about the fact that we
might have picked up box 1?
From (1-52), and indeed it is more
likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs
compared to box2).
.8571.00875.0
2/115.0)(
)()|()|( 11
1 =×==DP
BPBDPDBP
?)|( 1 =DBP
(1-52)
;5.0)( 1 =BP
,5.0857.0)|( 1 >=DBP