Download - Module1, probablity

1

Module 1

Probability

1. Introduction

In our daily life we come across many processes whose nature cannot be predicted in advance.

Such processes are referred to as random processes. The only way to derive information about

random processes is to conduct experiments. Each such experiment results in an outcome

which cannot be predicted beforehand. In fact even if the experiment is repeated under

identical conditions, due to presence of factors which are beyond control, outcomes of the

experiment may vary from trial to trial. However we may know in advance that each outcome

of the experiment will result in one of the several given possibilities. For example, in the cast of

a die under a fixed environment the outcome (number of dots on the upper face of the die)

cannot be predicted in advance and it varies from trial to trial. However we know in advance

that the outcome has to be among one of the numbers1, 2, … , 6. Probability theory deals with

the modeling and study of random processes. The field of Statistics is closely related to

probability theory and it deals with drawing inferences from the data pertaining to random

processes.

Definition 1.1

(i) A random experiment is an experiment in which:

(a) the set of all possible outcomes of the experiment is known in advance;

(b) the outcome of a particular performance (trial) of the experiment cannot be

predicted in advance;

(c) the experiment can be repeated under identical conditions.

(ii) The collection of all possible outcomes of a random experiment is called the sample

space. A sample space will usually be denoted by�. ▄

Example 1.1

(i) In the random experiment of casting a die one may take the sample space as � = 1, 2, 3, 4, 5, 6 , where � ∈ � indicates that the experiment results in �� = 1,… ,6� dots on the upper face of die.

(ii) In the random experiment of simultaneously flipping a coin and casting a die one may

take the sample space as

� = �, � × 1, 2, … , 6 = ��, ��:� ∈ �, � , � ∈ 1, 2, … , 6 �,

2

where ��, ��, �� indicates that the flip of the coin resulted in head (tail) on the

upper face and the cast of the die resulted in �� = 1, 2, … , 6� dots on the upper face.

(iii) Consider an experiment where a coin is tossed repeatedly until a head is observed. In

this case the sample space may be taken as � = 1, 2, … (or � =T, TH, TTH,… ),where � ∈ � (or TT⋯TH ∈ � with �� − 1�Ts and one H) indicates

that the experiment terminates on the �-th trial with first � − 1 trials resulting in tails on

the upper face and the �-th trial resulting in the head on the upper face.

(iv) In the random experiment of measuring lifetimes (in hours) of a particular brand of

batteries manufactured by a company one may take � = �0,70,000",where we have

assumed that no battery lasts for more than 70,000 hours. ▄

Definition 1.2

(i) Let � be the sample space of a random experiment and let # ⊆ �. If the outcome of the

random experiment is a member of the set # we say that the event # has occurred.

(ii) Two events #%and #&are said to be mutually exclusive if they cannot occur simultaneously,

i.e., if #% ∩ #& = (, the empty set. ▄

In a random experiment some events may be more likely to occur than the others. For

example, in the cast of a fair die (a die that is not biased towards any particular outcome),

the occurrence of an odd number of dots on the upper face is more likely than the

occurrence of 2 or 4dots on the upper face. Thus it may be desirable to quantify the

likelihoods of occurrences of various events. Probability of an event is a numerical measure

of chance with which that event occurs. To assign probabilities to various events associated

with a random experiment one may assign a real number )�#� ∈ �0,1" to each event # with

the interpretation that there is a �100 × )�#��% chance that the event # will occur and a +100 × �1 − )�#��,% chance that the event # will not occur. For example if the

probability of an event is 0.25 it would mean that there is a 25% chance that the event will

occur and that there is a 75% chance that the event will not occur. Note that, for any such

assignment of possibilities to be meaningful, one must have )�� = 1. Now we will discuss

two methods of assigning probabilities.

I. Classical Method

This method of assigning probabilities is used for random experiments which result in a

finite number of equally likely outcomes. Let � = .%, … , ./ be a finite sample space with 0�∈ ℕ� possible outcomes; here ℕ denotes the set of natural numbers. For ⊆ � , let |#| denote the number of elements in #. An outcome . ∈ � is said to be favorable to an event

3

# if . ∈ #. In the classical method of assigning probabilities, the probability of an event # is

given by

)�#� = numberofoutocmesfavorabletoEtotalnumberofoutcomes = |#||�| = |#|0 . Note that probabilities assigned through classical method satisfy the following properties of

intuitive appeal:

(i) For any event #, )�#� ≥ 0; (ii) For mutually exclusive events #%, #&, … , #/� i.e. , #D ∩ #E = ( , whenever �, F ∈1, … , 0 , � ≠ F�

)HI#D/DJ% K = |⋃ EMNMJ% |n = ∑ |EM|NMJ%n =P|EM|nN

MJ% = P)�#D�;/MJ%

(iii) )�� = |Q||Q| = 1 .

Example 1.2

Suppose that in a classroom we have 25 students (with registration numbers1, 2, … , 25) born in

the same year having 365 days. Suppose that we want to find the probability of the event #

that they all are born on different days of the year. Here an outcome consists of a sequence of

25 birthdays. Suppose that all such sequences are equally likely. Then |�| = 365&R, |E| = 365 × 364 × ⋯× 341 =STR )&R and )�#� = |U||Q| = STRVWXSTRWX ∙ The classical method of assigning probabilities has a limited applicability as it can be used only

for random experiments which result in a finite number of equally likely outcomes. ▄

II. Relative Frequency Method

Suppose that we have independent repetitions of a random experiment (here independent

repetitions means that the outcome of one trial is not affected by the outcome of another trial)

under identical conditions. Let Z[�#� denote the number of times an event # occurs (also

called the frequency of event # in \ trials) in the first \ trials and let �[�#� = Z[�#�/\ denote

the corresponding relative frequency. Using advanced probabilistic arguments (e.g., using Weak

Law of Large Numbers to be discussed in Module 7) it can be shown that, under mild

conditions, the relative frequencies stabilize (in certain sense) as \ gets large (i.e., for any

event #, lim[→`ra�E� exists in certain sense). In the relative frequency method of assigning

probabilities the probability of an event# is given by

4

)�#� = lim[→` �[�#� � lim[→`Z[�#�\ ∙

Figure 1.1. Plot of relative frequencies (�[�#�) of number of heads against number of trials (N)

in the random experiment of tossing a fair coin (with probability of head in each trial as 0.5).

In practice, to assign probability to an event #, the experiment is repeated a large (but fixed)

number of times (say \ times) and the approximation )�#� b �[�#� is used for assigning

probability to event#. Note that probabilities assigned through relative frequency method also

satisfy the following properties of intuitive appeal:

(i) for any event #, )�#� B 0; (ii) for mutually exclusive events #%, #&, … , #/

) HI#D/

DJ%K �P)�#D�

/

DJ%;

(iii) )�� 1. Although the relative frequency method seems to have more applicability than the classical

method it too has limitations. A major problem with the relative frequency method is that it is

5

imprecise as it is based on an approximation�)�#� ≈ �[�#��. Another difficulty with relative

frequency method is that it assumes that the experiment can be repeated a large number of

times. This may not be always possible due to budgetary and other constraints (e.g., in

predicting the success of a new space technology it may not be possible to repeat the

experiment a large number of times due to high costs involved).

The following definitions will be useful in future discussions.

Definition 1.3

(i) A set # is said to be finite if either # = ( (the empty set) or if there exists a one-one and

onto function Z: 1,2, … , 0 → #�orZ: # → 1,2, … , 0 � for some natural number 0;

(ii) A set is said to be infinite if it is not finite;

(iii) A set # is said to be countable if either # = ( or if there is an onto function Z:ℕ → #, where ℕ denotes the set of natural numbers;

(iv) A set is said to be countably infinite if it is countable and infinite;

(v) A set is said to be uncountable if it is not countable;

(vi) A set # is said to be continuum if there is a one-one and onto function Z:ℝ →#�orZ: # → ℝ�, where ℝ denotes the set of real numbers. ▄

The following proposition, whose proof(s) can be found in any standard textbook on set theory,

provides some of the properties of finite, countable and uncountable sets.

Proposition 1.1

(i) Any finite set is countable;

(ii) If d is a countable and e ⊆ d then e is countable;

(iii) Any uncountable set is an infinite set;

(iv) If d is an infinite set and d ⊆ e then e is infinite;

(v) If d is an uncountable set and d ⊆ e then e is uncountable;

(vi) If # is a finite set and f is a set such that there exists a one-one and onto function Z: # → f�orZ: f → #� then f is finite;

(vii) If # is a countably infinite (continuum) set andf is a set such that there exists a one-one

and onto function Z: # → f�orZ: f → #� then f is countably infinite (continuum);

(viii) A set # is countable if and only if either # = ( or there exists a one-one and onto map Z: # → ℕg, for some ℕg ⊆ ℕ; (ix) A set # is countable if, and only if, either # is finite or there exists a one-one map Z:ℕ → #; (x) A set # is countable if, and only if, either # = ( or there exists a one-one map Z: # →ℕ;

6

(xi) A non empty countable set # can be either written as# = .%, .&, …./ , for some 0 ∈ ℕ, or as # = .%, .&, … ; (xii) Unit interval �0,1� is uncountable. Hence any interval �h, i�, where −∞ < h < i < ∞,

is uncountable;

(xiii) ℕ × ℕ is countable;

(xiv) Let l be a countable set and let dm: n ∈ l be a (countable) collection of countable

sets. Then ⋃m∈odm is countable. In other words, countable union of countable sets is

countable;

(xv) Any continuum set is uncountable. ▄

Example 1.3

(i) Define Z:ℕ → ℕ by Z�0� = 0, 0 ∈ ℕ. Clearly Z:ℕ → ℕ is one-one and onto. Thus ℕ is

countable. Also it can be easily seen (using the contradiction method) that ℕ is infinite.

Thus ℕ is countably infinite.

(ii) Let ℤ denote the set of integers. Define Z:ℕ → ℤ by

Z�0� = q 0 − 12 , if0isodd−02 ,if0iseven

Clearly Z:ℕ → ℤ is one-one and onto. Therefore, using (i) above and Proportion 1.1 (vii), ℤ is countably infinite. Now on using Proportion 1.1 (ii) it follows that any subset of ℤ is

countable.

(iii) Using the fact thatℕ is countably infinite and Proposition 1.1 (xiv) it is straight forward

to show that ℚ (the set of rational numbers) is countably infinite.

(iv) Define Z:ℝ → ℝ and t:ℝ → �0, 1� by Z�u� = u, u ∈ ℝ, and t�u� = %%vwx , u ∈ ℝ. Then Z:ℝ → ℝ and t:ℝ → �0, 1� are one-one and onto functions. It follows that ℝand (0, 1)

are continuum (using Proposition 1.1 (vii)). Further, for −∞ < h < i < ∞ , let ℎ�u� = �i − h�u + h, u ∈ �0, 1�. Clearly ℎ: �0,1� → �h, i� is one-one and onto. Again

using proposition 1.1 (vii) it follows that any interval �h, i� is continuum. ▄

It is clear that it may not be possible to assign probabilities in a way that applies to every

situation. In the modern approach to probability theory one does not bother about how

probabilities are assigned. Assignment of probabilities to various subsets of the sample space �

that is consistent with intuitively appealing properties (i)-(iii) of classical (or relative frequency)

method is done through probability modeling. In advanced courses on probability theory it is

shown that in many situations (especially when the sample space � is continuum) it is not

7

possible to assign probabilities to all subsets of � such that properties (i)-(iii) of classical (or

relative frequency) method are satisfied. Therefore probabilities are assigned to only certain

types of subsets of �.

In the following section we will discuss the modern approach to probability theory where we

will not be concerned with how probabilities are assigned to suitably chosen subsets of �.

Rather we will define the concept of probability for certain types of subsets � using a set of

axioms that are consistent with properties (i)-(iii) of classical (or relative frequency) method.

We will also study various properties of probability measures.

2. Axiomatic Approach to Probability and Properties of Probability Measure

We begin this section with the following definitions.

Definition 2.1

(i) A set whose elements are themselves set is called a class of sets. A class of sets will be

usually denoted by script letters {,ℬ, },…. For example { = �1 , 1, 3 , 2, 5, 6 �; (ii) Let } be a class of sets. A function ~: } → ℝ is called a set function. In other words, a

real-valued function whose domain is a class of sets is called a set function. ▄

As stated above, in many situations, it may not be possible to assign probabilities to all subsets

of the sample space � such that properties (i)-(iii) of classical (or relative frequency) method

are satisfied. Therefore one begins with assigning probabilities to members of an appropriately

chosen class } of subsets of � (e.g., if � = ℝ, then } may be class of all open intervals in ℝ; if �

is a countable set, then } may be class of all singletons . , . ∈ �). We call the members of }

as basic sets. Starting from the basic sets in } assignment of probabilities is extended, in an

intuitively justified manner, to as many subsets of � as possible keeping in mind that properties

(i)-(iii) of classical (or relative frequency) method are not violated. Let us denote by ℱ the class

of sets for which the probability assignments can be finally done. We call the class ℱ as event

space and elements of ℱare called events. It will be reasonable to assume that ℱ satisfies the

following properties: (i) � ∈ ℱ, (ii) d ∈ ℱ ⟹ d� = � − d ∈ ℱ ,and (iii)dD ∈ ℱ, � = 1,2, … ⇒⋃ dD ∈ ℱDJ% . This leads to introduction of the following definition.

Definition 2.2

A sigma-field (�-field) of subsets of � is a class ℱ of subsets of � satisfying the following

properties:

(i) � ∈ ℱ;

(ii) d ∈ ℱ ⇒ d� = � − d ∈ ℱ (closed under complements);

8

(iii) dD ∈ ℱ, � = 1, 2, … ⇒ ⋃ dD ∈ ℱDJ% (closed under countably infinite unions). ▄

Remark 2.1

(i) We expect the event space to be a �-field;

(ii) Suppose that ℱ is a �-field of subsets of �. Then,

(a) ( ∈ ℱ�since( = �� (b) #%, #&, … ∈ ℱ ⇒ ⋂ #D ∈ ℱDJ% �since ⋂ #DDJ% = �⋃ #D�DJ% ��; (c) #, f ∈ ℱ ⇒ # − f = # ∩ f� ∈ ℱ and #Δf ≝ �# − f� ∪ �f − #� ∈ ℱ; (d) #%, #&, … , #/ ∈ ℱ, for some 0 ∈ ℕ,⇒ ⋃ #D ∈ ℱ/DJ% and ⋂ #D ∈ ℱ/DJ% (take #/v% = #/v& = ⋯ = (so that ⋃ #D/DJ% = ⋃ #D∞DJ% or #/v% = #/v& = ⋯ = � so

that ⋂ #D/DJ% = ⋂ #D∞DJ% );

(e) although the power set of �� is a �-field of subsets of �,in general, a �-

field may not contain all subsets of �. ▄

Example 2.1

(i) ℱ = (, � is a sigma field, called the trivial sigma-field;

(ii) Suppose that d ⊆ �. Then ℱ = d, d� , (, � is a �-field of subsets of �. It is the

smallest sigma-field containing the set d; (iii) Arbitrary intersection of �-fields is a �-field (see Problem 3 (i));

(iv) Let } be a class of subsets of � and let fm ∶ n ∈ l be the collection of all �-fields

that contain}. Then ℱ = �ℱmm∈o

is a �-field and it is the smallest �-field that contains class } (called the�-field

generated by } and is denoted by ��}�) (see Problem 3 (iii));

(v) Let� = ℝ and let � be the class of all open intervals in ℝ. Then ℬ% = �� is called

the Borel � -field on ℝ. The Borel � -field in ℝ� (denoted by ℬ� ) is the � -field

generated by class of all open rectangles in ℝ�. A set e ∈ ℬ� is called a Borel set in ℝ�; here ℝ� = �u%, … , u��: −∞ < uD < ∞, � = 1,… , � denotes the �-dimensional

Euclidean space;

(vi) ℬ% contains all singletons and hence all countable subsets of ℝ+h = ⋂ +h −/J%%/ , h + %/,, ∙ ▄

Let } be an appropriately chosen class of basic subsets of � for which the probabilities can be

assigned to begin with (e.g., if � = ℝ then }may be class of all open intervals in ℝ; if � is a

countable set then } may be class of all singletons . , . ∈ �). It turns out (a topic for an

advanced course in probability theory) that, for an appropriately chosen class } of basic sets,

9

the assignment of probabilities that is consistent with properties (i)-(iii) of classical (or relative

frequency) method can be extended in an unique manner from }to��}�, the smallest �-field

containing the class}. Therefore, generally the domain ℱ of a probability measure is taken to

be ��}�, the �-field generated by the class } of basic subsets of �. We have stated before that

we will not care about how assignment of probabilities to various members of event space ℱ (a �-field of subsets of �) is done. Rather we will be interested in properties of probability

measure defined on event space ℱ.

Let � be a sample space associated with a random experiment and let ℱ be the event space (a �-field of subsets of �). Recall that members of ℱ are called events. Now we provide a

mathematical definition of probability based on a set of axioms.

Definition 2.3

(i) Let ℱ be a �-field of subsets of �. A probability function (or a probability measure) is a

set function ), defined on ℱ, satisfying the following three axioms:

(a) )�#� ≥ 0,∀# ∈ ℱ; (Axiom 1: Non-negativity);

(b) If #%, #&, … is a countably infinite collection of mutually exclusive events �i. e., #D ∈ℱ, � = 1, 2, … , #D ∩ #E = (, � ≠ F� then

) HI#D∞

DJ% K = P)�#D�∞

%J% ; �Axiom2: Countablyinfiniteadditive� (c) )�� = 1(Axiom 3: Probability of the sample space is 1).

(ii) The triplet ��, ℱ, )� is called a probability space. ▄

Remark 2.2

(i) Note that if #%, #&, … is a countably infinite collection of sets in a � -field ℱ then ⋃ #DDJ% ∈ ℱ and, therefore, )�⋃ #DDJ% � is well defined;

(ii) In any probability space ��, ℱ, )� we have )�� = 1 (or )�(� = 0; see Theorem 2.1 (i)

proved later) but if )�d� = 1 (or )�d� = 0), for some d ∈ ℱ, then it does not mean that d = � ( or d = () (see Problem 14 (ii).

(iii) In general not all subsets of � are events, i.e., not all subsets of � are elements ofℱ.

(iv) When � is countable it is possible to assign probabilities to all subsets of � using Axiom

2 provided we can assign probabilities to singleton subsets u of �. To illustrate this let � = .%, .&, … �orΩ = .%, … , ./ , forsomen ∈ ℕ� and let )�.D � = �D , � =

10

1, 2, … , so that 0 ≤ �D ≤ 1, � = 1,2, … (see Theorem 2.1 (iii) below) and ∑ �D =DJ% ∑ )�.D �DJ% = )�⋃ .D DJ% � = )�� = 1. Then, for any d ⊆ �,

)�d� = P �D .D:��∈�

Thus in this case we may take ℱ = )��, the power set of �. It is worth mentioning

here that if � is countable and } = �. ∶ . ∈ �� (class of all singleton subsets of �) is

the class of basic sets for which the assignment of the probabilities can be done, to

begin with, then ��}� = �� (see Problem 5 (ii)).

(v) Due to some inconsistency problems, assignment of probabilities for all subsets of � is

not possible when � is continuum (e.g., if � contains an interval). ▄

Theorem 2.1

Let��, ℱ, )�be a probability space. Then

(i) )�(� = 0; (ii) #D ∈ ℱ, � = 1, 2, … . 0 , and #D ∩ #E = (, � ≠ F ⇒ )�⋃ #D/DJ% � = ∑ )�#D�/DJ% (finite

additivity);

(iii) ∀# ∈ ℱ, 0 ≤ )�#� ≤ 1and)�#�� = 1 − )�#�; (iv) #%, #& ∈ ℱ and #% ⊆ #& ⇒ )�#& − #%� = )�#&� − )�#%� and )�#%� ≤ )�#&�

(monotonicity of probability measures);

(v) #%, #& ∈ ℱ ⇒ )�#% ∪ #&� = )�#%� + )�#&� − )�#% ∩ #&�. Proof.

(i) Let #% = � and #D = (, � = 2, 3, … . Then )�#%� = 1 , (Axiom 3) #D ∈ ℱ, � = 1, 2, … ,#% = ⋃ #DDJ% and #D ∩ #E = (, � ≠ F. Therefore,

1 = )�#%� = ) HI#D`DJ% K

=P)�#D��usingAxiom2�`DJ%

= 1 +P)�(�`DJ&

⇒ P)�(�`DJ& = 0

11

⇒ )�(� � 0.

(ii) Let #D � (, � � 0 z 1, 0 z 2,… . Then #D ∈ �, � � 1, 2, … , #D ∩ #E � (, � G F and

)�#D� � 0, � � 0 z 1, 0 z 2,…. Therefore,

) HI#D/

%J%K � )HI#D

`

%J%K

� P)�#D��usingAxiom2�`

DJ%

� P)�#D�/

MJ%.

(iii) Let # ∈ �. Then � � # ∪ #� and # ∩ #� � (. Therefore

1 � )�� )�# ∪ #�� )�#� z )�#�� (using (ii)) ⇒ )�#� � 1 and )�#�� 1 � )�#� (since )�#�� ∈ �0,1") ⇒ 0 � )�#� � 1 and )�#�� 1 � )�#�.

(iv) Let #%, #& ∈ � and let #% ⊆ #& . Then #& � #% ∈ �, #& � #% ∪ �#& � #%� and #% ∩�#& � #%� � (.

Figure 2.1

Therefore,

12

)�#&� = )�#% ∪ �#& � #%��

� )�#%� z )�#& � #%� (using (ii))

⇒ )�#& � #%� � )�#&� � )�#%�. As )�#& � #%� B 0, it follows that)�#%� � )�#&�. (v) Let #%, #& ∈ �. Then #& �#% ∈ �,#% ∩ �#& � #%� � ( and #% ∪ #& � #% ∪�#& � #%�.

Figure 2.2

Therefore,

)�#% ∪ #&� � )�#% ∪ �#& � #%��

� )�#%� z )�#& � #%� (using (ii)) (2.1)

Also �#% ∩ #&� ∩ �#& � #%� � ( and #& � �#% ∩ #&� ∪ �#& � #%�. Therefore,

Figure 2.3

)�#&� � )��#% ∩ #&� ∪ �#& � #%��

13

= )�#% ∩ #&� + )�#& − #%� (using (ii)

⇒ )�#& − #%� = )�#&� − )�#% ∩ #&� ∙ (2.2)

Using (2.1) and (2.2), we get

)�#% ∪ #&� = )�#%� + )�#&� − )�#% ∩ #&�. ▄

Theorem 2.2 (Inclusion-Exclusion Formula)

Let ��, ℱ, )� be a probability space and let #%, #&, … , #/ ∈ ℱ�0 ∈ ℕ, 0 ≥ 2�. Then

) HI#D/DJ% K = P��,//

�J% , where �%,/ = ∑ )�#D�/DJ% and, for � ∈ 2, 3, … , 0 , ��,/ = �−1��% P )�#D% ∩ #D& ∩⋯∩ #D��.%�D��⋯�D��/

Proof. We will use the principle of mathematical induction. Using Theorem 2.1 (v), we have

)�#% ∪ #&� = )�#%� + )�#&� − )�#% ∩ #&� = �%,& +�&,&, where �%,& = )�#%� + )�#&� and �&,& = −)�#% ∩ #&�. Thus the result is true for 0 = 2. Now

suppose that the result is true for 0 ∈ 2, 3, … , for some positive integer �≥ 2�. Then

) HI #D¡v%DJ% K = )¢HI#D¡

DJ% K ∪ #¡v%£

= ) HI#D¡DJ% K + )�#¡v%� − )¢HI#D¡

DJ% K ∩ #¡v%£�usingtheresultfor0 = 2� = ) HI#D¡

DJ% K + )�#¡v%� − ) HI�#D ∩ #¡v%�¡DJ% K

= P�D,¡¡DJ% + )�#¡v%� − ) HI�#D ∩ #¡v%�¡

DJ% K�usingtheresultfor0 = ��2.3� Let fD = #D ∩ #¡v%, � = 1,… . . Then

14

) HI�#D ∩ #¡v%�¡DJ% K = ) HIfD¡

DJ% K

= ∑ ��,¡¡�J% �againusingtheresultfor0 = �,�2.4� where �%,¡ = ∑ )�fD�¡DJ% = ∑ )�#D ∩ #¡v%�¡DJ% and, for � ∈ 2, 3,⋯ , , ��,¡ = �−1��% P )�fD� ∩ fDW ∩ ⋯∩ fD��%�D��DW�⋯�D��¡

= �−1��% P )�#D� ∩ #DW ∩⋯∩ #D� ∩ #¡v%�%�D��DW�⋯�D��¡ .

Using (2.4) in (2.3), we get

)�⋃ #D¡v%DJ% � = +�%,¡ + )�#¡v%�, + ��&,¡ − �%,¡� + ⋯+ ��¡,¡ − �¡�%,¡� − �¡,¡ .

Note that �%,¡ + )�#¡v%� = �%,¡v%, ��,¡ − ��%,¡ = ��,¡v%, � = 2,3, … , , and �¡,¡ =−�¡v%,¡v%. Therefore,

)HI #D¡v%DJ% K = �%,¡v% + P ��,¡v%

¡v%�J& = P ��,¡v%

¡v%�J% .▄

Remark 2.3

(i) Let#%, #&… ∈ ℱ. Then )�#% ∪ #& ∪ #S�= )�#%� + )�#&� + )�#S�¥¦¦¦¦¦¦§¦¦¦¦¦¦¨©�,ª −�)�#% ∩ #&� + )�#% ∩ #S� + )�#& ∩ #S��¥¦¦¦¦¦¦¦¦¦¦¦¦¦§¦¦¦¦¦¦¦¦¦¦¦¦¦¨©W,ª +)�#% ∩ #& ∩ #S�¥¦¦¦¦¦§¦¦¦¦¦¨©ª,ª

= �%,S − �&,S + �S,S,

where �%,S = �%,S, �&,S = −�&,Sand�S,S = �S,S.

In general,

)�⋃ #D/DJ% � = �%,/ − �&,/ + �S,/⋯+ �−1�/�%�/,/,

where

15

�D,/ = « �D,/,if�isodd−�D,/,if�iseven , � = 1, 2, … 0.

(ii) We have

1 ≥ )�#% ∪ #&� = )�#%� + )�#&� − )�#% ∩ #&� ⇒ )�#% ∩ #&� ≥ )�#%� + )�#&� − 1. The above inequality is known as Bonferroni’s inequality. ▄

Theorem 2.3

Let ��, ℱ, )� be a probability space and let #%, #&, … , #/ ∈ ℱ�0 ∈ ℕ, 0 ≥ 2�. Then, under

the notations of Theorem 2.2,

(i) (Boole’s Inequality) �%,/ + �&,/ ≤ )�⋃ #D/%J% � ≤ �%,/; (ii) (Bonferroni’s Inequality) )�⋂ #D/%J% � ≥ �%,/ − �0 − 1�.

Proof.

(i) We will use the principle of mathematical induction. We have )�#% ∪ #&� = )�#%� + )�#&�¥¦¦¦§¦¦¦¨©�,W −)�#% ∩ #&�¥¦¦¦§¦¦¦¨©W,W

= �%,& + �&,& ≤ �%,&,

where �%,& = )�#%� + )�#&� and �&,& = −)�#% ∩ #&� ≤ 0.

Thus the result is true for 0 = 2. Now suppose that the result is true for 0 ∈2, 3, … , for some positive integer �≥ 2�, i.e., suppose that for arbitrary events f%, … , f¡ ∈ ℱ

) ¢IfD�DJ% £ ≤P)�fD��

DJ% , � = 2, 3, … , �2.5� and

) ¢IfD�DJ% £ ≥P)�fD��

DJ% − P )�fD ∩ fE�%�D�E�� , � = 2, 3, … , .�2.6� Then

16


DJ% K ∪ #¡v%£

≤ ) HI#D¡DJ% K + )�#¡v%��using�2.5�for� = 2�

≤ P)�#D�¡DJ% + )�#¡v%��using�2.5�fork = m�

= P )�#D�¡v%DJ% = �%,¡v%.�2.7�

Also,


DJ% K ∪ #¡v%£

= ) HI#D¡DJ% K+ )�#¡v%� − )¢HI#D¡

DJ% K ∩ #¡v%£ �usingTheorem2.2� = ) HI#D¡

DJ% K + )�#¡v%� − ) HI�#D ∩ #¡v%�¡DJ% K.�2.8�

Using (2.5), for � = , we get

) HI�#D ∩ #¡v%�¡DJ% K ≤P)¡

DJ% �#D ∩ #¡v%�,�2.9�

and using (2.6), for � = , we get

) HI#D¡DJ% K ≥ �%,¡ + �&,¡.�2.10�

Now using (2.9) and (2.10) in (2.8), we get

17

) HI #D¡v%DJ% K ≥ �%,¡ + �&,¡ + )�#¡v%� −P)�#D ∩ #¡v%�¡

DJ%

= P )�#D�¡v%DJ% − P )�#D ∩ #E�%�D�E�¡v%

= �%,¡v% + �&,¡v%. (2.11)

Combining (2.7) and (2.11), we get

�%,¡v% + �&,¡v% ≤ )HI #D¡v%%J% K ≤ �%,¡v%,

and the assertion follows by principle of mathematical induction.

(ii) We have

) H�#D/MJ% K = 1 − )¢H�#D/

MJ% K�£

= 1 − )�IEM�NMJ% �

≥ 1 −P)/%J% �#D��usingBoole°sinequality�

= 1 −P�1 − )�#D��/DJ%

= P)�#D� − �0 − 1�.▄/DJ%

Remark 2.4

Under the notation of Theorem 2.2 we can in fact prove the following inequalities:

P�E,/&�EJ% ≤ )¢I#E/

EJ% £ ≤ P �E,/&��%EJ% , � = 1,2, … , ²02³,

18

where ²/&³ denotes the largest integer not exceeding /& . ▄

Corollary 2.1

Let ��, ℱ, )� be a probability space and let #%, #&, … , #/ ∈ ℱ be events. Then

(i) )�#D� = 0, � = 1,… , 0 ⇔ )�⋃ #D/DJ% � = 0; (ii) )�#D� = 1, � = 1,… , 0 ⇔ )�⋂ #D/DJ% � = 1.

Proof.

(i) First suppose that )�#D� = 0, � = 1,… , 0.Using Boole’s inequality, we get

0 ≤ ) HI#D/DJ% K ≤ P)�#D�/

DJ% = 0. It follows that )�⋃ #D/DJ% � = 0. Conversely, suppose that )�⋃ #E/EJ% � = 0 . Then #D ⊆ ⋃ #E/EJ% , � = 1, … , 0 , and

therefore,

0 ≤ )�#D� ≤ )¢I#E/µJ% £ = 0, � = 1,… , 0,

i.e.,)�#D� = 0, � = 1,… , 0.

(ii) We have )�#D� = 1, � = 1,… , 0 ⇔ )�#D�� = 0, � = 1, … , 0

⇔ ) HI#D�/DJ% K = 0�using�i��

⇔ )¢HI#D�/DJ% K�£ = 1,

⇔ ) H�#D/DJ% K = 1.▄

Definition 2.4

A countable collection #D: � ∈ l of events is said to be exhaustive if )�⋃ #DD∈o � = 1. ▄

19

Example 2.2 (Equally Likely Probability Models)

Consider a probability space ��, ℱ, )� . Suppose that, for some positive integer � ≥ 2 , � = ⋃ ¶D�DJ% , where ¶%, ¶&, … , ¶� are mutually exclusive, exhaustive and equally likely events,

i.e.,¶D ∩ ¶E = (, if � ≠ F,)�⋃ ¶D�DJ% � = ∑ )�DJ% �¶D� = 1 and )�¶%� = ⋯ = )�¶�� = %� .Further

suppose that an event # ∈ ℱ can be written as

# = ¶D% ∪ ¶D& ∪⋯∪ ¶D· , where �%, … , �· ⊆ 1,… , � , ¶DE ∩ ¶D� = (, F ≠ �and � ∈ 2, … , � . Then

)�#� = P)+¶DE,·EJ% = ��.

Note that here � is the total number of ways in which the random experiment can terminate

(number of partition sets ¶%, … , ¶� ), and� is the number of ways that are favorable to # ∈ ℱ.

Thus, for any # ∈ ℱ,

)�#� = numberofcasesfavorableto#totalnumberofcases = ��, which is the same as classical method of assigning probabilities. Here the assumption that ¶%, … , ¶� are equally likely is a part of probability modeling. ▄

For a finite sample space �, when we say that an experiment has been performed at random

we mean that various possible outcomes in � are equally likely. For example when we say that

two numbers are chosen at random, without replacement, from the set 1, 2, 3 then � = �1, 2 , 1, 3 , 2, 3 �and )�1, 2 � = )�1, 3 � = )�2, 3 � = %S, where �, F indicates that

the experiment terminates with chosen numbers as �andF, �, F ∈ 1, 2, 3 , � ≠ F. Example 2.3

Suppose that five cards are drawn at random and without replacement from a deck of 52

cards. Here the sample space � comprises of all +525 , combinations of 5 cards. Thus number of

favorable cases= +525 , = �, say. Let ¶%, … , ¶� be singleton subsets of �.Then � = ⋃ ¶D�DJ% and )�¶%� = ⋯ = )�¶�� = %�.Let #% be the event that each card is spade. Then

Number of cases favorable to #% = +135 ,.

20

Therefore,

)�#%� = +135 ,+525 , ∙

Now let #& be the event that at least one of the drawn cards is spade. Then #&� is the event that

none of the drawn cards is spade, andnumber of cases favorable to#&� = +395 , ∙Therefore,

)�#&�� = +395 ,+525 ,,

and )�#&� = 1 − )�#&�� = 1 − +SR ,+R&R , ∙ Let #S be the event that among the drawn cards three are kings and two are queens. Then

number of cases favorable to#S = +43, +42, and, therefore,

)�#S� = +43, +42,+525 , ∙ Similarly, if #¹ is the event that among the drawn cards two are kings, two are queens and one

is jack, then

)�#¹� = +42, +42, +41,+525 , .▄

Example 2.4

Suppose that we have 0�≥ 2� letters and corresponding 0 addressed envelopes. If these

letters are inserted at random in 0 envelopes find the probability that no letter is inserted into

the correct envelope.

Solution. Let us label the letters as º%, º&, … , º/ and respective envelopes as d%, d&, … , d/. Let #D denote the event that letter ºD is (correctly) inserted into envelope dD, � = 1, 2, … , 0. We

need to find )�⋂ #D�/DJ% �. We have

21

) H�#D�/DJ% K = ) ¢HI#D/

DJ% K�£ = 1 − ) HI#D/DJ% K = 1 −P��,/ ,/

�J%

where, for � ∈ 1, 2, … , 0 , ��,/ = �−1��% P )�#D� ∩ #DW ∩ ⋯∩ #D��.%�D��DW�⋯�D��/

Note that 0 letters can be inserted into 0 envelopes in 0! ways. Also, for 1 ≤ �% < �& < ⋯ <�� ≤ 0, #D� ∩ #DW ∩⋯∩ #D� is the event that letters ºD� , ºDW , … , ºD� are inserted into correct

envelopes. Clearly number of cases favorable to this event is �0 − ��! . Therefore, for 1 ≤ �% < �& < ⋯ < �� ≤ 0, )�#D� ∩ #DW ∩⋯∩ #D�� = �0 − ��!0!

⇒ ��,/ = �−1��% P �0 − ��!0!1≤�1<�2<⋯<��≤0

= �−1��% +0�, �0 − ��!0!

= �−1��%�!

⇒ ) H�#D�/DJ% K = 12! − 13! + 14! − ⋯+ �−1�/0! .▄

3. Conditional Probability and Independence of Events

Let ��, ℱ, )� be a given probability space. In many situations we may not be interested in the

whole space �. Rather we may be interested in a subset e ∈ ℱ of the sample space �. This may

happen, for example, when we know apriori that the outcome of the experiment has to be an

element of e ∈ ℱ.

Example 3.1

Consider a random experiment of shuffling a deck of 52 cards in such a way that all 52! arrangements of cards (when looked from top to bottom) are equally likely.

22

Here,

� =all 52! permutations of cards,

and

ℱ = ��Ω�. Now suppose that it is noticed that the bottom card is the king of heart. In the light of this

information, sample space e comprises of51! arrangements of 52 cards with bottom card as

king of heart.Define the event

¼:top card is king.

For # ∈ ℱ, define

)�#� = probability of event # under sample space �,

)½�#� = probability of event# under sample space e.

Clearly,

)½�¼� = S×Rg!R%! .

Note that

)½�¼� = 3 × 50!51! = S×Rg!R&!R%!R&! = )�¼ ∩ e�)�e�

i. e. , )½�¼� = )�¼ ∩ e�)�e� .�3.1�

We call )½�¼� the conditional probability of event¼ given that the experiment will result in an

outcome in e (i.e., the experiment will result in an outcome . ∈ e ) and )�¼� the

unconditional probability of event ¼. ▄

Example 3.1 lays ground for introduction of the concept of conditional probability.

Let ��, ℱ, )� be a given probability space. Suppose that we know in advance that the outcome

of the experiment has to be an element of e ∈ ℱ, where )�e� > 0. In such situations the

sample space is e and natural contenders for the membership of the event space are

23

d ∩ e ∶ d ∈ � . This raises the question whether �½ � d ∩ e ∶ d ∈ � is an event space?

i.e., whether �½ � d ∩ e ∶ d ∈ � is a sigma-field of subsets of e?

Theorem 3.1

Let � be a �-field of subsets � and let e ∈ �. Define �½ � d ∩ e ∶ d ∈ � . Then �½ is a�-

field of subsets of eand �½ ⊆ �.

Proof. Since e ∈ � and �½ � d ∩ e ∶ d ∈ � it is obvious that �½ ⊆ �. We have � ∈ � and

therefore

e � � ∩ e ∈ �½ .�3.2� Also,

¶ ∈ �½ ⇒ C � A ∩ e for same d ∈ �

⇒ ¶� � e � ¶ � �� d�¥¦§¦∈�

∩ e (sinced ∈ �)

Figure 3.1

⇒ ¶� � e � ¶ ∈ �½, (3.3)

i.e., �½ is closed under complements with respect to e.

Now suppose that ¶D ∈ �½ , � � 1,2, ….Then¶D � dD ∩ e, for somedD ∈ �, � � 1,2, …. Therefore,

I¶D`

DJ%�HIdD

`

DJ%K¥¦¦§¦¦

∈�∩ e�sincedD ∈ �, � � 1,2, … �

24

∈ ℱ½,�3.4�

i.e., ℱ½ is closed under countable unions.

Now (3.2), (3.3) and (3.4) imply that ℱ½is a �-field of subsets ofe. ▄

Equation (3.1) suggests considering the set function )½: ℱ½ → ℝ defined by

)½�¶� = )�¶�)�e� , ¶ ∈ ℱ½ = d ∩ e: d ∈ ℱ . Note that, for¶ ∈ ℱ½ , )�¶� is well defined as ℱ½ ⊆ ℱ.

Let us define another set function )�∙ |e� ∶ ℱ → ℝ by

P�d|e� � )½�d ∩ e� = )�d ∩ e�)�e� , d ∈ ℱ. Theorem 3.2

Let ��, ℱ, )�be a probability space and let e ∈ ℱ be such that )�e� > 0. Then �e, ℱ½, )½� and ��, ℱ, )�⋅ |e�� are probability spaces.

Proof. Clearly

)½�¶� � Á��Á�½� B 0, ∀¶ ∈ ℱ½.

Let ¶D ∈ ℱ½, � = 1, 2, … be mutually exclusive.Then ¶D ∈ ℱ, � = 1, 2, … (sinceℱ½ ⊆ ℱ), and

)½ HI¶D`DJ% K = )�⋃ ¶DDJ% �)�e�

= ∑ )�¶D�DJ%)�e�

= P)�¶D�)�e�`DJ%

= P)½`DJ% �¶D�,�3.5�

i.e., )½ is countable additive on ℱ½.

25

Also

)½�e� = )�e�)�e� = 1 ∙ Thus )½ is a probability measure on ℱ½.

Note that )�d|e� B 0, ∀d ∈ ℱ and

)��|B� � )�� ∩ B�)�e� = )�e�)�e� = 1 ∙ Let #D ∈ ℱ, � = 1,2, … be mutually exclusive. Then ¶D = #D ∩ e ∈ ℱ½ , � = 1, 2, … are mutually

exclusive and

) HI#D|e`

DJ%K � )½ HI¶D

`

DJ%K �P)½�¶D�

`

DJ%� P)½

`

DJ%�#D ∩ e� = P)`

DJ% �#D|e�.�using�3.5��

It follows that)�∙ |e� is a probability measure on ℱ. ▄

Note that domains of )½�∙� and )�∙ |e� are ℱ½ and ℱ respectively. Moreover,

)�d|e� � )½�d ∩ e� = )�d ∩ e�)�e� , d ∈ ℱ. Definition 3.1

Let ��, ℱ, )� be a probability space and let e ∈ ℱ be a fixed event such that )�e� > 0. Define

the set function )�∙ |e�: ℱ → ℝ by

)�d|e� � )½�d ∩ e� = )�d ∩ e�)�e� , d ∈ ℱ. We call )�d|e� the conditional probability of event d given that the outcome of the

experiment is ine or simply the conditional probability of d given e. ▄

Example 3.2

Six cards are dealt at random (without replacement) from a deck of 52 cards. Find the

probability of getting all cards of heart in a hand (event A) given that there are at least 5 cards

of heart in the hand (event B).

Solution. We have,

26

)�d|e� = )�d ∩ e�)�e� . Clearly,

)�d ∩ e� = )�d� = +%ST ,+R&T ,, and )�e� = +%SR ,+S% ,v+%ST ,+R&T , ∙ Therefore,

)�d|e� = +136 ,+135 , +391 , + +136 , .▄

Remark 3.1

For events #%, … , #/ ∈ ℱ�0 ≥ 2�,

)�#% ∩ #&� = )�#%�)�#&|#%�, if )�#%� > 0,

and

)�#% ∩ #& ∩ #S� = )��#% ∩ #&� ∩ #S�

= )�#% ∩ #&�)�#S|#% ∩ #&� = )�#%�)�#&|#%�)�#S|#% ∩ #&�. If )�#% ∩ #&� > 0 (which also guarantees that )�#%� > 0, since #% ∩ #& ⊆ #%).

Using principle of mathematical induction it can be shown that

) H�#D/DJ% K = )�#%�)�#&|#%�)�#S|#% ∩ #&�⋯)�#/|#% ∩ #& ∩ ⋯∩ #/�%�,

provided )�#% ∩ #& ∩⋯∩ #/�%� > 0(which also guarantees that )�#% ∩ #& ∩⋯∩ #D� > 0,� = 1, 2,⋯ , 0 − 1). ▄

27

Example 3.3

An urn contains four red and six black balls. Two balls are drawn successively, at random and

without replacement, from the urn. Find the probability that the first draw resulted in a red ball

and the second draw resulted in a black ball.

Solution. Define the events

d: first draw results in a red ball;

e: second draw results in a black ball.

Then,

Required probability = )�d ∩ e� = )�d�)�e|d�

� 410 × 69 = 1245 .▄

Let ��, ℱ, )� be a probability space. For a countable collection #D: � ∈ l of mutually exclusive

and exhaustive events, the following theorem provides a relationship between marginal

probability )�#� of an event # ∈ ℱ and joint probabilities )�# ∩ #D� of events # and #D, � ∈ l.

Theorem 3.3 (Theorem of Total Probability)

Let ��, ℱ, )� be a probability space and let #D: � ∈ l be a countable collection of mutually

exclusive and exhaustive events (i.e.,#D ∩ #E = (, whenever � ≠ F, and )�⋃ #DD∈o � = 1) such

that )�#D� > 0, ∀� ∈ l.Then, for any event # ∈ ℱ,

)�#� = P)�# ∩ #D�D∈o =P)�#|#D�D∈o

)�#D�. Proof. Let f = ⋃ #DD∈o . Then)�f� = 1 and )�f�� = 1 − )�f� = 0. Therefore,

)�#� = )�# ∩ f� + )�# ∩ f�� = )�# ∩ f��# ∩ f� ⊆ f� ⇒ 0 ≤ )�# ∩ f�� ≤ )�f�� = 0�

= ) HI�# ∩ #D�D∈o K =P)�# ∩ #D�D∈o �#DÂaredisjoint⇒ #D ∩ #s�⊆ #D�aredisjoint�

28

= P)�#|#D�D∈o

)�#D�.▄

Example 3.4

Urn Ä% contains 4 white and 6 black balls and urn Ä& contains 6 white and 4 black balls. A fair

die is cast and urn Ä% is selected if the upper face of die shows 5 or 6 dots. Otherwise urn Ä& is

selected. If a ball is drawn at random from the selected urn find the probability that the drawn

ball is white.

Solution. Define the events: Å ∶ drawnballiswhite;#% ∶ urnÄ%isselected;#& ∶ urnÄ&isselected.

Then #%, #& is a collection of mutually exclusive and exhaustive events. Therefore

)�Å� = )�#%�)�Å|#%� z )�#&�)�Å|#&� � 26 × 410 + 46 × 610

= 815 ∙ ▄

The following theorem provides a method for finding the probability of occurrence of an event

in a past trial based on information on occurrences in future trials.

Theorem 3.4 (Bayes’ Theorem)

Let ��, ℱ, )� be a probability space and let #D:� ∈ l be a countable collection of mutually

exclusive and exhaustive events with )�#D� > 0, � ∈ l . Then, for any event # ∈ ℱ with )�#� > 0, we have

)�#E|#� � )�#|#E�)�#E�∑ )�#|#D�)�#D�D∈o , F ∈ l ∙ Proof. We have, for F ∈ l,

)�#E|#� � )�#E ∩ #�)�#�

29

= )�#|#E�)�#E�)�#�

� )�#|#E�)�#E�∑ )�#|#D�)�#D�D∈o �usingTheoremofTotalProbability�.▄

Remark 3.2

(i) Suppose that the occurrence of any one of the mutually exclusive and exhaustive

events #D, � ∈ l, causes the occurrence of an event #. Given that the event # has

occurred, Bayes’ theorem provides the conditional probability that the event # is

caused by occurrence of event #E, F ∈ l.

(ii) In Bayes’ theorem the probabilities )�#E�, F ∈ l, are referred to as prior probabilities

and the probabilities )�#E|#�, F ∈ l, are referred to as posterior probabilities. ▄

To see an application of Bayes’ theorem let us revisit Example 3.4.

Example 3.5

UrnÄ%contains 4 white and 6 black balls and urn Ä& contains 6 white and 4black balls. A fair

die is cast and urn Ä% is selected if the upper face of die shows five or six dots. Otherwise urn Ä& is selected. A ball is drawn at random from the selected urn.

(i) Given that the drawn ball is white, find the conditional probability that it came from

urn Ä%;

(ii) Given that the drawn ball is white, find the conditional probability that it came from

urn Ä&.

Solution. Define the events:

Å ∶ drawn ball is white; #% ∶ urnÄ%isselected#& ∶ urnÄ&isselected Ç mutually&exhaustiveevents

(i) We have )�#%|Å� � )�Å|#%�)�#%�)�Å|#%�)�#%� z )�Å|#&�)�#&�

� ¹%g × &

T¹%g × &

T z T%g × ¹T

30

= 14 ∙ (ii) Since #% and #& are mutually exclusive and )�#% ∪ #&|Å� � )��|Å� � 1, we have

)�#&|Å� � 1 − )�#%|Å� � 34 ∙ ▄

In the above example

)�#%|Å� � %¹ < %

S � )�#%�, and)�#&|Å� � 34 > 23 = )�#&�, i.e.,

(i) the probability of occurrence of event #% decreases in the presence of the information

that the outcome will be an element of Å;

(ii) the probability of occurrence of event #& increases in the presence of information that

the outcome will be an element of Å.

These phenomena are related to the concept of association defined in the sequel.

Note that

)�#%|Å� < )�#%� ⇔ )�#% ∩Å� < )�#%�)�Å�,

and

)�#&|Å� > )�#&� ⇔ )�#& ∩Å� > )�#&�)�Å�.

Definition 3.2 Let��, ℱ, )� be a probability space and let d and e be two events. Events d and e are said to

be

(i) negatively associated if )�d ∩ e� < )�d�)�e�; (ii) positively associated if )�d ∩ e� > )�d�)�e�; (iii) independent if )�d ∩ e� = )�d�)�e�. ▄

Remark 3.3

31

(i) If )�e� = 0 then )�d ∩ e� = 0 = )�d�)�e�, ∀d ∈ ℱ, i.e., if )�e� = 0 then any

event d ∈ ℱ and e are independent;

(ii) If )�e� > 0 then d and e are independent If, and only if, )�d|e� � )�d�, i.e., if )�e� > 0, then events d and e are independent if, and only if, the availability of the

information that event e has occurred does not alter the probability of occurrence

of event d. ▄

Now we define the concept of independence for arbitrary collection of events.

Definition 3.3 Let ��, ℱ, )� be a probability space. Let l ⊆ ℝ be an index set and let #m: n ∈ l be a

collection of events in ℱ.

(i) Events #m: n ∈ l are said to be pair wise independent if any pair of events #m and #É, n ≠ Ê in the collection �#E: F ∈ l� are independent. i.e., if )�#m ∩ #É� =)�#m�)�#É�, whenever n, Ê ∈ l and n ≠ Ê;

(ii) Let l = 1, 2, … , n , for some 0 ∈ ℕ, so that #m: n ∈ l = #%, … , #/ is a finite

collection of events in ℱ. Events #%, … , #/ are said to be independent if, for any sub

collection �#m%, … , #m�� of #%, … , #/ �� = 2,3, … , 0�

) ¢�#mE�EJ% £ = Ë)�

EJ% +#mE,.�3.6�

(iii) Let l ⊆ ℝ be an arbitrary index set. Events #m: n ∈ l are said to be independent if

any finite sub collection of events in #m: n ∈ l forms a collection of independent

events. ▄

Remark 3.4

(i) To verify that0 events #%, … , #/ ∈ ℱ are independent one must verify 2/ − 0 −1+= ∑ +0F,/µJ& , conditions in (3.6). For example, to conclude that three events #%, #& and #S are independent, the following 4 �= 2S − 3 − 1� conditions must be

verified: )�#% ∩ #&� = )�#%�)�#&�; )�#% ∩ #S� = )�#%�)�#S�;

32

)�#& ∩ #S� = )�#&�)�#S�; )�#% ∩ #& ∩ #S� = )�#%�)�#&�)�#S�.

(ii) If events #%, … , #/ are independent then, for any permutation �n%, … , n/� of �1, … , 0�, the events #m%, … , #m/ are also independent. Thus the notion of

independence is symmetric in the events involved.

(iv) Events in any sub collection of independent events are independent. In particular

independence of a collection of events implies their pair wise independence. ▄

The following example illustrates that, in general, pair wise independence of a collection of

events may not imply their independence.

Example 3.6 Let � = 1, 2, 3, 4 and let ℱ = �� , the power set of � . Consider the probability

space ��, ℱ, P�, where )�� = %¹ , � = 1, 2, 3, 4 . Let d = 1, 4 , e = 2, 4 and¶ = 3, 4 . Then, )�d� = )�e� = )�¶� = %&,

)�d ∩ e� = )�d ∩ ¶� = )�e ∩ ¶� = )�4 � = %¹,

and)�d ∩ e ∩ ¶� = )�4 � = %¹ ∙ Clearly,

)�d ∩ e� = )�d�)�e�; )�d ∩ ¶� = )�d�)�¶�, and)�e ∩ ¶� = )�e�)�¶�, i.e., d, e and ¶ are pairwise independent.

However,

)�d ∩ e ∩ ¶� = %¹ ≠ )�d�)�e�)�¶�. Thus d, e and ¶are not independent. ▄

Theorem 3.5 Let ��, ℱ, )� be a probability space and let d and e be independent events (d, e ∈ ℱ).Then

(i) d� and e are independent events;

33

(ii) d and e�are independent events;

(iii) d�ande� are independent events.

Proof. We have

)�d ∩ e� = )�d�)�e�.

(i) Since e = �d ∩ e� ∪�d� ∩ e� and �d ∩ e� ∩ �d� ∩ e� = (, we have )�e� = )�d ∩ e� + )�d� ∩ e� ⇒ )�d� ∩ e� = )�e� − )�d ∩ e�

= )�e� − )�d�)�e� = �1 − )�d��)�e� = )�d��)�e�, i.e., d� and e are independent events.

(ii) Follows from (i) by interchanging the roles of d and e. (iii) Follows on using (i) and (ii) sequentially. ▄

The following theorem strengthens the results of Theorem 3.5.

Theorem 3.6 Let ��, ℱ, )� be a probability space and let f%, … , f/�0 ∈ ℕ, 0 ≥ 2� be independent events in ℱ. Then, for any � ∈ 1, 2, … , 0 − 1 and any permutation�n%, … , n/� of �1, … , 0�, the events fm%, … , fm� , fm�Ì�� , … , fmÍ� are independent. Moreover the events f%� , … , f/� are independent.

Proof. Since the notion of independence is symmetric in the events involved, it is enough to

show that for any � ∈ 1, 2, … , 0 − 1 the events f%, … , f� , f�v%� , … , f/� are independent. Using

backward induction and symmetry in the notion of independence the above mentioned

assertion would follow if, under the hypothesis of the theorem, we show that the events f%, … , f/�%, f/� are independent. For this consider a sub collection �fD%, … , fD¡, Î� of f%, … , f/�%, f/��%, … , �¡ ⊆ 1,… , 0 − 1 �, where Î = f/�orÎ = fE , for some F ∈ 1, … , 0 −1 − �%, … , �¡ , depending on whether or not f/� is a part of sub collection �fD%, … , fD¡, Î� . Thus the following two cases arise: ÏÐÑÒÓ. Î = f/�

Since f%, … , f/ are independent, we have

34

)¢�fDE¡EJ% £ =Ë)¡

EJ% +fDE,, and

) Ô¢�fDE¡EJ% £ ∩ f/Õ = ÖË)+fDE,¡

EJ% × )�f/�

= ) ¢�fDE¡EJ% £)�f/�

⇒ events�fDE¡EJ% andf/areindependent

⇒ events ⋂ fDE¡EJ% andf/�areindependent�Theorem3.5�ii�) ⇒ ) Ô¢�fDE¡

EJ% £ ∩ f/�Õ = )¢�fDE¡EJ% £)�f/��

= ÖË)+fDE,¡EJ% × )�f/��

⇒ )�fD% ∩⋯∩ fD¡ ∩ Î� = ÖË)+fDE,¡EJ% × )�Î�.

Case II. Î = fE , for someF ∈ 1, … , 0 − 1 − �%, … , �¡ . In this case �fD%, … , fD¡ , Î� is a sub collection of independent events f%, … , f/ and therefore

)�fD% ∩⋯∩ fD¡ ∩ Î� = ÖËfDE¡EJ% × )�Î�.

Now the result follows on combining the two cases. ▄

35

When we say that two or more random experiments are independent (or that two or more

random experiments are performed independently) it simply means that the events associated

with the respective random experiments are independent.

4. Continuity of Probability Measures

We begin this section with the following definition.

Definition 4.1

Let ��, ℱ, )� be a probability space and let d/: 0 = 1, 2, … be a sequence of events in ℱ.

(i) We say that the sequence d/: 0 = 1, 2, … is increasing (written as d/ ↑ ) if d/ ⊆ d/v%, 0 = 1,2, … ; (ii) We say that the sequence d/: 0 = 1, 2, … is decreasing (written as d/ ↓ ) if d/v% ⊆ d/ , 0 = 1,2, … ; (iii) We say that the sequence d/: 0 = 1, 2, … is monotone if either d/ ↑ or d/ ↓;

(iv) If d/ ↑ we define the limit of the sequence d/: 0 = 1, 2, … as ⋃ d//J% and write Lim/→` d/ = ⋃ d//J% ;

(v) If d/ ↓ we define the limit of the sequence d/: 0 = 1, 2, … as ⋂ d//J% and write Lim/→` d/ = ⋂ d//J% . ▄

Throughout we will denote the limit of a monotone sequence d/: 0 = 1, 2, … of events by Lim/→` d/ and the limit of a sequence h/: 0 = 1, 2, … of real numbers (provided it exists) by lim/→` h/.

Theorem 4.1 (Continuity of Probability Measures)

Let d/: 0 = 1, 2, … be a sequence of monotone events in a probability space��, ℱ, )�. Then

) +Lim/→`d/, = lim/→`)�d/�. Proof.

Case I. d/ ↑

In this case, Lim/→` d/ = ⋃ d//J% . Define e% = d%, e/ = d/ − d/�%, 0 = 2, 3, ….

36

Figure 4.1

Thene/ ∈ �, 0 � 1, 2… , e/s are mutually exclusive and⋃ e//J% � ⋃ d//J% � Lim/→` d/ .

Therefore,

) +Lim/→`d/, � ) HIe/`

/J%K

� P)�e/�`

/J%

� lim/→`P)�e��/

�J%

� lim/→` Ü)�d%� zP)�d� � d��%�/

�J&Ý

� lim/→` Ü)�d%� zP�)�d�� )�d��%��/

�J&Ý

(using Theorem 2.1 (iv) since d��% ⊆ d� , � � 1, 2, …)

� lim/→` Ü)�d%� zP)/

�J&�d�� P)

/

�J&�d��%�Ý

� lim/→`�)�d%� z )�d/� � )�d%�" � lim/→`)�d/�.

37

Case II. d/ ↓

In this case, Lim/→`d/ = ⋂ d//J% and d/� ↑.Therefore,

) +Lim/→`d/, = ) H�d/`/J% K

= 1 − ) HH�d/`/J% K�K

= 1 − ) HId/�`/J% K

= 1 − )�Lim/→`d/� � = 1 − lim/→`)�d/� ��usingCaseI, sinced/� ↑� = 1 − lim/→`�1 − )�d/��

= lim/→`)�d/�. ▄

Remark 4.1 Let ��, ℱ, )� be a probability space and let #D: � = 1, 2, … be a countably infinite collection of

events in ℱ. Define

e/ = I#D/DJ% and¶/ = �#D/

DJ% , 0 = 1,2, …

Then e/ ↑, ¶/ ↓, Lim/→`e/ = ⋃ e//J% = ⋃ #DDJ% and Lim/→`¶/ = ⋂ #DDJ% . Therefore

) HI#D`DJ% K = ) +Lim/→`e/,

= lim/→`)�e/��usingTheorem4.1� = lim/→`)HI#D/

DJ% K

= lim/→`ß�%,/ + �&,/ +⋯+ �/,/à,

38

where Sâ,Ns are as defined in Theorem 2.2.

Moreover,

) H�#D`DJ% K = ) +Lim/→`¶/,

= lim/→`)�¶/��usingTheorem4.1� = lim/→`)�⋂ #D/DJ% �. Similarly, if #D:� = 1, 2,⋯ is a collection of independent events, then

) H�#D`DJ% K = lim/→`) H�#D/

DJ% K

= lim/→` ÜË)/DJ% �#D�Ý

= Ë)`DJ% �#D�.▄

Problems

1. Let � = 1, 2, 3, 4 . Check which of the following is a sigma-field of subsets of �:

(i) ℱ% = �(, 1, 2 , 3, 4 �; (ii) ℱ& = �(, �, 1 , 2, 3, 4 , 1, 2 , 3, 4 �;

(iii) ℱS = �(, �, 1 , 2 , 1, 2 , 3, 4 2, 3, 4 , 1, 3, 4 �.

2. Show that a class ℱ of subsets of � is a sigma-field of subsets of � if, and only if, the

following three conditions are satisfied: (i) � ∈ ℱ ; (ii) d ∈ ℱ ⇒d� = � − d ∈ ℱ ;

(iii) d/ ∈ ℱ, n = 1, 2,⋯ ⇒ ⋂ d/ ∈/J% ℱ.

3. Let ℱã:ä ∈ l be a collection of sigma-fields of subsets of �.

(i) Show that ⋂ ℱãã∈o is a sigma-field;

(ii) Using a counter example show that ∪ã∈o ℱã may not be a sigma-field;

39

(iii) Let } be a class of subsets of � and let ℱã:ä ∈ l be a collection of all sigma-fields

that contain the class } . Show that ��}� = ⋂ ℱãã∈o , where ��}� denotes the

smallest sigma-field containing the class } (or the sigma-field generated by class }).

4. Let � be an infinite set and let { = d ⊆ �: disfiniteord�isfinite . (i) Show that { is closed under complements and finite unions;

(ii) Using a counter example show that { may not be closed under countably infinite

unions (and hence { may not be a sigma-field).

5. (i) Let � be an uncountable set and let ℱ = d ⊆ �: discountableord� iscountable . (a) Show that ℱ is a sigma-field;

(b) What can you say about ℱwhen � is countable?

(ii) Let Ω be a countable set and let } = . : . ∈ Ω . Show that ��}� = ��.

6. Let ℱ = �� =the power set of � = 0, 1, 2, … . In each of the following cases, verify

if ��, ℱ, )� is a probability space:

(i) )�d� = ∑ å�ãæ∈� äæ u!⁄ , d ∈ ℱ, ä > 0; (ii) )�d� = ∑ ��1 − ��ææ∈� , d ∈ ℱ, 0 < � < 1; (iii) )�d� = 0, if d has a finite number of elements, and )�d� = 1, if d has infinite

number of elements, d ∈ ℱ.

7. Let ��, ℱ, )�be a probability space and let d, e, ¶, è ∈ ℱ . Suppose that )�d� =0.6, )�e� = 0.5, )�¶� = 0.4, )�d ∩ e� = 0.3, )�d ∩ ¶� = 0.2, )�e ∩ ¶� = 0.2,)�d ∩ e ∩ ¶� = 0.1, )�e ∩ è� = )�¶ ∩ è� = 0, )�d ∩ è� = 0.1and)�è� = 0.2. Find:

(i) )�d ∪ e ∪ ¶�and)�d� ∩ e� ∩ ¶��; (ii) )��d ∪ e� ∩ ¶�and)�d ∪ �e ∩ ¶��; (iii) )��d� ∪ e�� ∩ ¶��and)��d� ∩ e�� ∪ ¶��; (iv) )�e ∩ ¶ ∩ è�and)�d ∩ ¶ ∩ è�; (v) )�d ∪ e ∪ è�and)�d ∪ e ∪ ¶ ∪ è�; (vi) )��d ∩ e� ∪ �¶ ∩ è��.

8. Let ��, ℱ, )� be a probability space and let d and ebe two events (i.e., d, e ∈ ℱ).

(i) Show that the probability that exactly one of the events d or e will occur is given by )�d� + )�e� − 2)�d ∩ e�; (ii) Show that )�d ∩ e� − )�d�)�e� = )�d�)�e�� − )�d ∩ e�� = )�d��)�e� −)�d� ∩ e� = )��d ∪ e�� − )�d��)�e��.

40

9. Suppose that 0�≥ 3� persons )%, … , )/ are made to stand in a row at random. Find the

probability that there are exactly � person between )%and)&; here � ∈ 1, 2, … , 0 − 2 .

10. A point �é, ê� is randomly chosen on the unit square � = �u, ë�: 0 ≤ u ≤ 1, 0 ≤ ë ≤1 (i.e., for any region ì ⊆ � for which the area is defined, the probability that �é, ê�

lies on ì is íîïíðñòíîïíðñ©� ⋅ Find the probability that the distance from �é, ê� to the nearest

side does not exceed %S units.

11. Three numbers h, i and óare chosen at random and with replacement from the set 1, 2, … ,6 . Find the probability that the quadratic equation hu& + iu + ó = 0 will have

real root(s).

12. Three numbers are chosen at random from the set 1, 2, … ,50 . Find the probability that

the chosen numbers are in

(i) arithmetic progression;

(ii) geometric progression.

13. Consider an empty box in which four balls are to be placed (one-by-one) according to

the following scheme. A fair die is cast each time and the number of dots on the upper

face is noted. If the upper face shows up 2 or 5 dots then a white ball is placed in the

box. Otherwise a black ball is placed in the box. Given that the first ball placed in the box

was white find the probability that the box will contain exactly two black balls.

14. Let ��0, 1", ℱ, )� be a probability space such that ℱ is the smallest sigma-field

containing all subintervals of � = �0, 1"and)��h, i"� = i − h, where 0 ≤ h < i ≤ 1

(such a probability measure is known to exist).

(i) Show that i = ⋂ +i − %/v% , i³/J% , ∀i ∈ �0, 1"; (ii) Show that )�i � = 0, ∀i ∈ �0, 1"and )��0, 1"� = 1(Note that here )�i � = 0

but i ≠ (and)��0, 1�� = 1but�0, 1� ≠ Ω) ;

(iii) Show that, for any countable set d ∈ ℱ, )�d� = 0; (iv) For 0 ∈ ℕ, let d/ = +0, %/³ and e/ = +%&+ %/v& , 1³ . Verify that d/ ↓, e/ ↑,)�Lim/→` d/� = lim/→` )�d/� and)�Lim/→` e/� = lim/→` )�e/�.

15. Consider four coding machines ô%, ô&, ôSandô¹ producing binary codes 0 and 1. The

machine ô% produces codes0 and 1 with respective probabilities %¹ and

S¹. The code

produced by machine ô� is fed into machine ô�v%�� = 1, 2, 3� which may either leave

41

the received code unchanged or may change it. Suppose that each of the machines ô&, ôSandô¹ change the received code with probabilityS¹. Given that the machine ô¹

has produced code 1, find the conditional probability that the machine ô% produced

code 0.

16. A student appears in the examinations of four subjects Biology, Chemistry, Physics and

Mathematics. Suppose that probabilities of the student clearing examinations in these

subjects are %& , %S , %¹ and %R respectively. Assuming that the performances of the students

in four subjects are independent, find the probability that the student will clear

examination(s) of

(i) all the subjects; (ii) no subject; (iii) exactly one subject;

(iv) exactly two subjects; (v) at least one subject.

17. Let d and ebe independent events. Show that

max)��d ∪ e��, )�d ∩ e�, )�dΔe� ≥ 49,

where dΔe = �d − e� ∪ �e − d�. 18. For independent events d%, … , d/, show that:

) H�dD�/DJ% K ≤ å�∑ Á��Í�ö� .

19. Let ��, ℱ, )� be a probability space and let d%, d&, … be a sequence of events �i. e. , dD ∈ ℱ, � = 1, 2, … � . Define e/ = ⋂ dDDJ/ , ¶/ = ⋃ dD , 0 = 1,2, … ,DJ/ è =⋃ e//J% and # = ⋂ ¶//J% . Show that:

(i) è is the event that all but a finite number of d/s occur and # is the event that

infinitely many d/s occur;

(ii) è ⊆ #; (iii) )�#�� = lim/→` )�¶/�� = lim/→` lim¡→` )�⋂ d��¡�J/ � and )�#� = lim/→` )�¶/�; (iv) if ∑ )�d/�/J% < ∞ then, with probability one, only finitely many d/s will occur;

(v) if d%, d&, … are independent and ∑ )�d/�/J% < ∞ then, with probability one,

infinitely many d/Â will occur.

42

20. Let d, eand¶ be three events such that dande are negatively (positively) associated

and e and ¶ are negatively (positively) associated. Can we conclude that, in general, d

and ¶ are negatively (positively) associated?

21. Let ��, ℱ, )� be a probability space and let A and B two events�i. e., d, e ∈ ℱ�. Show

that if d and e are positively (negatively) associated then d and e� are negatively

(positively) associated.

22. A locality has 0 houses numbered 1,… . , 0 and a terrorist is hiding in one of these

houses. Let �E denote the event that the terrorist is hiding in house numbered F, F = 1,… , 0 and let )��E� = �E ∈ �0,1�, F = 1,… , 0. During a search operation, let fE

denote the event that search of the house number Fwill fail to nab the terrorist there

and let )�fE|�E� � �E ∈ �0,1�, F = 1, … , 0. For each �, F ∈ 1, … , 0 , � ≠ F, show that �E andfE are negatively associated but �DandfE are positively associated. Interpret

these findings.

23. Let d, eand¶ be three events such that )�e ∩ ¶� > 0. Prove or disprove each of the

following:

(i) )�d ∩ e|¶� � )�d|e ∩ ¶�)�e|¶�; (ii) )�d ∩ e|¶� � )�d|¶�)�e|¶� if dande

are independent events.

24. A�-out-of-0 system is a system comprising of 0 components that functions if, and only

if, at least �� ∈ 1,2, … , 0 � of the components function. A1-out-of-0 system is called

a parallel system and an0 -out-of-0 system is called a series system. Consider 0

components ¶%, … , ¶/ that function independently. At any given time ÷ the probability

that the component ¶D will be functioning is �D�÷��∈ �0,1�� and the probability that it

will not be functioning at time ÷ is 1 − �D�÷�, � = 1,… , 0.

(i) Find the probability that a parallel system comprising of components ¶%, … , ¶/ will

function at time ÷;

(ii) Find the probability that a series system comprising of components¶%, …,¶/ will

function at time ÷;

(iii) If �D�÷� = ��÷�, � = 1,… , 0, find the probability that a �-out-of-0 system comprising

of components ¶%, … , ¶/ will function at time÷.