Skript Probability Oct12 2013

7/27/2019 Skript Probability Oct12 2013

1/32

Probability

Dr. Michael Hinz, Bielefeld University

WS 2013/2014, Mo 16-18 T2-233, Tue 16-18 T2-204


2/32

Contents

1 Introduction 1

2 Discrete probability spaces 72.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Conditional probability and independence . . . . . . . . . . . 182.3 Discrete random variables . . . . . . . . . . . . . . . . . . . . 232.4 Bernoulli trials . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Absolutely continuous distributions 25

4 Measure theory and Lebesgue integration 26

5 Product spaces and independence 27

6 Weak convergence and the Central Limit Theorem 28

7 Conditioning 29

8 Martingale theory 30

ii


3/32

Chapter 1

Introduction

Any form of intelligent life - and a human being in particular - extractsinformation from the sensorial input provided by the immediate environment.This information is observed and then processed further. Its evaluation(e.g. danger or reward) allows to make decisions and to establish appropriateroutines and reactions (e.g. flight or cooperation). The single most importantmechanism in evaluating information that is trial and error. But on aphilosphical level, this is nothing but testing a hypothesis by an experiment:

An experiment is an orderly procedure carried out with the goal

of verifying, refuting or establishing the validity of a hypothesis.(URL: http://en.wikipedia.org/wiki/Experiment)

1


4/32

2 CHAPTER 1. INTRODUCTION

A hypothesis is a proposed explanation for an observed phe-

nomenon.(URL: http://en.wikipedia.org/wiki/Hypotheses)

Sometimes the idea of trial and error - source of all intelligent behaviour- is shamelessly disregarded. This also happens often in the educationalsystem, where the terrible habit of considering mistakes something bad iswidespread. This is illogical, because without mistakes there cannot be anylearning process, any acquisition of reliable knowledge ! After having com-pleted our learning process up to an adequate level we of course hope to haveestablished precise and correct thinking.

Maybe an experiment has more to do with perception than with physicalreality. This is an issue people discuss in constructivism. But in any casewe should keep in mind that what exactly is the experiment, what are itspossible outcomes, and what is the hypothesis to be tested, will always be amatter of agreement.

A sort of trivial experiment is an experiment with a predetermined out-come. In this case it suffices to perform this experiment just once to decidewhether the hypothesis is true or false. For example, we may throw a ballin the air and wait for things to happen. We assume our strength is finiteand the mass of the ball is positive. According to classical mechanics it willsooner or later fall down. The hypothesis the ball stays in the air forevercan be disproved at once. (Note however that we made silent and intrinsicassumptions: sooner or later fall down might suggest that we are standingon or above the surface of the earth, and that earth will not stop to existwhile the ball is in the air. In other words: We have to explain what wemean by common sense.)

In more interesting cases the experiment has several possible outcomes,i.e. its outcome is not predetermined. It is reasonable assume the collectionof all possible outcomes of the experiment is known. For example, we may

toss a coin (to be precise, an ideal coin that will always land on a face andnever remain standing on its edge). The hypothesis the coin always showshead can be disproved, but we may need to perform our experiment severaltimes to do so.


5/32

3

The (theoretical) design of our experiment depends on the amount of

information we can observe and on the sort of information we would liketo extract. Suppose, for example, we record the temperature in a certaingeographical location. Suppose that measurements are taken daily over theperiod of one year of 365 days and that we are close to the equator, such thatseasons do not play a role. Assume that our thermometer only yields integernumbers. If we are interested in a single large set of data consisting of 365numbers, then this set may be considered an outcome of the experiment, andno further discussion is needed. But this is interesting only if our hypothesisneccessarily has to be phrased in terms of 365 numbers. If we would liketo compare two locations, it may be more intelligent to ask for an averagetemperature, i.e. one single number. This amounts to the idea of having

an experiment whose possible outcomes are integer numbers and which isperformed repeatedly for 365 times. If we plot the absolute numbers ofthe occurrences of certain temperatures in a histogram, then we may try toconclude that they obey a certain distribution. In other words: We thinkof our experiment as an experiment with random outcome and ask for aprobabilistic structure. This is filtering information: For instance, this ideadoes not tell whether it is 20oC today or yesterday.

When an experiment is performed repeatedly, then two ideas arise:

We may want to record, evaluate and interpret the obtained data. Forinstance, we may want to look at a time series to see whether we candetermine distributions or dependencies. Observing and interpreting

are the objectives of statistics.

We may want to model the phenomenon and extract further informa-tion from the model: Once we know the physical system under obser-vation shows typical statistical features, we can try to come up with a


6/32


probabilistic model. This model allows further theoretical conclusions

(independent of further physical observations) and maybe used to fore-cast the future behaviour of the system. The model and the deductionof further information is the task of probability.

In this sense probability is talking about models for experiments withseveral possible outcomes. We do not talk about reality, whatever reality is.Whether a probabilistic model is considered appropriate for a physical exper-iment is usually decided in subsequent steps of statistical testing or simplyby common sense. But in any case this is not a question to be decided bymathematics alone.

Another interpretation of assigning probabilities to certain events is tothink of it as a measure of belief, for instance think of opinion polls. Luck-ily, this idea can be captured by exactly the same mathematical idea as usedto model experiments, we dont have to come up with yet another theory.

As far as we know robability theory started in the sixteenth and seven-teenth century (Cardano, Fermat, Pascal, Bernoulli, etc.; Bernoullis Arsconjectandi appeared 1713), mainly to analyze games of chance. Based onearlier results a great body of work was established in the nineteenth cen-tury (Euler, Lagrange, Gauss, Laplace, etc.; Laplaces Theorie analytiquedes probabilites appeared 1812). This is what we refer to as classical prob-ability. It is already very useful and intuitive but still limited to modellingexperiments that have at most a countable number of possible outcomes orcan be described by probability measures that have densities (calculus-basedmethods, Riemann integration). We will follow this spirit in the first of ourlecture, and most concepts of probabilistic thinking can already be estab-lished at this level.Later in the nineteenth century other areas of mathematics bloomed, for in-stance set theory (Cantor, Dedekind, etc.), while at the same time probabilitytheory was merely unknown to the general public. In an encyclopedia fromaround 1890 it says that mathematical probability is the ratio with numerator

being the number of fortunate cases and denominator being the number of allpossible cases. They also describe an urn model, say that an increasing num-ber of experimental runs improves the quality of observations using empiricalprobabilities, refer to Bernoulli and Laplace, and relate probability theory toGauss least squares method. Finally they mention a couple of contemporary


7/32

5

textbooks on probability. But compared to what the encyclopedia tells about

geometry, algebra or analysis, the article is extremely short.


8/32


In the course of the nineteenth century mathematicians started to under-

stand that it would be clever to base probability on set theory (HausdorffsGrundzuge der Mengenlehre 1912 was an important influence for probabil-ity theory), because then abstract and general models for experiments withan uncountable number of outcomes could be formulated and investigated.This dream came with the question how to determine the probability of a set(called event), and after an intense discussion people realized that determin-ing the probability of an event should be more or less the same as measuringlength, area or volume of a subset of the Euclidean space. This lead to theconcept of measure theory and to an axiomatization of probability theorybased on measure theory (along with Lebesgue integration), developed in theearly years of the last century (Hausdorff, von Mises, Wiener, Kolmogorov,

etc.) and commonly referred to as modern probability. Usually its inventionis attributed to Kolmogorov (Kolomgorovs Grundbegriff der Wahrschein-lichkeitsrechnung appeared 1933), but he himself remarked that these ideashad been around for some time. Modern probability is an incredibly flexibletheory, and closely related developments in physics or economics (Bachelier,Einstein, Smoluchowski, Wiener, Birkhoff, Bose, Boltzmann, etc.) went par-allel. Because measure theory is a deep and somewhat abstract business closeto the axiomatic roots of mathematics we will only have a peek into some ofits concepts.

In terms of its strentgh and flexibility, probability theory took an evolu-tion in two, maybe even three steps: discrete theory, theory based on calculusand modern theory based on measure theory. It still is a relatively youngmathematical subject.


9/32

Chapter 2

Discrete probability spaces

Mon, Oct 142013In this chapter we have a look at key notions and ideas of probability with

a minimum of technicalities. This is possible for models that use discreteprobability spaces.

2.1 Basic notions

We use mathematical set theory to model an experiment.

There are different axiomatic systems mathematics can be based upon

(mathematics is neither true nor false, as shown bei Kurt Godel around1930, inspired by David Hilbert and Hans Hahn), and usually the notionset is explained in the chosen axiomatic system (for instance, the so-calledZermelo-Fraenkel axioms). An earlier and less rigoros definition (at thattime axiomatic was not yet understood) was given by Georg Cantor in thelate nineteenth century:

A set is a gathering together into a whole of definite, distinctobjects of our perception or of our thought - which are calledelements of the set.

For our purposes let us please agree to accept this definition of the notionset.

Exercise 2.1.1. Study or review the customary notation and verbalizationof set theory (including power set), set relations and operations and relatednotions, Venn diagrams, connection to logical operations.

7


10/32

8 CHAPTER 2. DISCRETE PROBABILITY SPACES

The collection of all possible outcomes of an experiment is represented

by a nonempty set , called the sample space. Its elements

are calledsamples or outcomes.

Examples 2.1.1.

(1) Tossing a coin: A reasonable model is = {H, T}, where H standsfor head and T for tails. The possible outcomes in this model are Hand T.

(2) Throwing a die: Common sense suggests = {1, 2, 3, 4, 5, 6}.

(3) Life span of a device or a living organism: A natural choice is =[0, +), the possible outcomes are all nonnegative real numbers 0 t < +. The outcome 0 [0, +) means broken from the beginningor born dead.

(4) Random choice of a number in [0, 1]: Of course = [0, 1].

Exercise 2.1.2.

(1) Find an appropriate sample space for tossing a coin five times.

(2) Find an appropriate sample space for two people playing rock-paper-scissors.

A function (or map or mapping) f : from a set to a set is asubset of the cartesian product

:= {(, ) : , }

such that any is the first component of exactly one ordered pair (, ).If is the first component of (, ), called argument, we write f() todenote the second component , called value.


11/32

2.1. BASIC NOTIONS 9

A function f : is called injective if f() = f() implies = .(Injectivity forbids two different arguments to have the same value.)

A set is called countable if there is an injective function from to the

natural numbers N. Such an injective map is called an enumeration of ourset.

Definition 2.1.1. A sample space is called discrete if it is countable.

If is discrete we can write = {j}j=1

, i.e. there exists an enumerationfor .


12/32


Examples 2.1.2. The sample spaces in Example 2.1.1 (1) and (2) are dis-

crete, but those in (3) and (4) are not.Definition 2.1.2. A subset A of a discrete sample space is called anevent. An event of form {} with is called elementary event.

Note that an event is a set A consisting of certain outcomes . When theexperiment is practically (or mentally) carried out, then it yields a certainsample (outcome) . We say that the event A takes place or occurs oris realized with this sample if A.

Examples 2.1.3.

(1) Tossing a coin: The events are , , {H} and {T}.

(2) When throwing a die the idea to obtain an even number is representedby the event A := {2, 4, 6} = {1, 2, 3, 4, 5, 6}.

Exercise 2.1.3. Write down all events for tossing a coin.

Given two events A, B , we say that A or B occurs if A B occurs.Similarly, we say that A and B occur if A B occurs.

Exercise 2.1.4. When throwing a die, what is the event that the outcome isan even number and greater that two ? What is the event that it is an even

number or greater than two ?

Now some events may be particularly interesting for us, and we mightwant to somehow assign them a number telling how likely their occurrenceis.

Examples 2.1.4.

(1) Maybe {H} is interesting when tossing a coin to make an otherwisedifficult decision by chance.

(2) When throwing a dice we might want to know how likely it is to obtain

an even number. That would be the probability of {2, 4, 6}.

If the model is appropriate for the experiment, then the probability of anevent A should be close to the relative frequency (sometimes also calledempirical probability) observed when performing the experiment repeatedly


13/32


and in such a way that the different trials do not influence each other. If n

denotes the total number of trials, andnA the number of outcomes of theevent A, then the relative frequence is given by the ratio

nA

n.

Writing P(A) for the probability of A (which is yet to be defined at thispoint), then the best we could hope for is to observe the limit relation

limn

nA

n= P(A).

While the sample space is often dictated by the design of the experiment

or by common sense, the question which probabilites to assign to an eventreally is a matter of modelling. The following notion already provides acomplete mathematical model for many applications.

Definition 2.1.3. Let = {j}j=1

be a discrete sample space. A probability

measure P on the discrete sample space is a countable collection {pj}j=1

of numbers pj 0 with

j=1pj = 1. The probability of an event A isgiven by

P(A) :=

j1; jA

pj .

To (,P) we refer as discrete probability space.

If = {j}j=1

, {pj}j=1

and P are as in the definition, then obviously

P({j}) = pj , j = 1, 2,...

are the probabilities of the elementary events {j}.

Remark 2.1.1. This is a definition in the style of classical probability. Whendealing with uncountably infinite sample spaces (i.e. to describe experi-ments with a continuum of possible outcomes such as determining the lifes-pan of an organism), then it is usually impossible to come up with a reason-

able way of assigning a probability to each subset A . Later we will seehow to fix this by regarding only a proper subset F of the power set P()as the collection of events.

We list a couple of properties of probability measures.


14/32


Lemma 2.1.1. Let(,P) be a discrete probability space.

(i) P() = 1 andP() = 0.

(ii) If A B thenP(A) P(B).

(iii) P(A B) = P(A) + P(B) P(A B).

(iv) P(Ac) = 1 P(A).

(v) P(

i=1 Ai)

i=1 P(Ai).

(vi) If A1, A2, A3,... are pairwise disjoint events (i.e. if Ai Ak = when-ever i = k) then

P

i=1

Ai

=

i=1

P(Ai).

Proof. According to the definition we have P() =

j=1pj = 1, becausej for all j, and P() = 0, because there is no j such that j . Thisis (i). If A B we have

j1: jA

pj

j1: jB

pj,

what gives (ii). For (iii) note that

j1: jAB

pj =

j1: jApj +

j1: jB

pj

j1: jABpj

(substracting the last sum ensures each index j with j A B appearsexactly once on the right hand side of the equality). A special case of (iii)together with (i) gives (iv). Finally, we have

j1: j

i=1Ai

pj i=1

j1: Ai

pj

with equality if the Ai are pairwise disjoint. This proves (v) and (vi).

Remark 2.1.2. Given a discrete probability spaces (,P), we should try tothink of a probability measure P as a function P : P() [0, 1] from thepower set P() of into the unit interval [0, 1]. This function is normed by(i), monotoneby (ii), and the properties (v) and (vi) are called -subadditivityand -additivity, respectively.


15/32


Examples 2.1.5.

(1) Tossing a fair coin: = {H, T}, P({H}) = 12 .

(2) Throwing a fair die: = {1, ..., 6}, P({}) = p1 = ... = p6 =16

.

(2) Throwing an unfair (biased, marked) die: = {1, ..., 6}, P({6}) = p6 =12

and P({}) = p1 = ... = p5 =110

for {1, 2, 3, 4, 5}. Of coursethere are also many other ways to make a die unfair. But whether it isfair or not should be reflected by the choice of the probability measure.

(3) Tossing two fair coins at once (or tossing one coin twice but without anyinfluence between the two trials): = {(H, H), (H, T), (T, H), (T, T)}and P({}) = 1

4, .

(4) To open a safe one needs to know the correct code consisting of 4 digitseach between 0 and 9. We would like to know the probability of findingit by chance. In this case

= {(0, 0, 0, 0), (0, 0, 0, 1), ...., (9, 9, 9, 8), (9, 9, 9, 9)}

has 104 different elements, and common sense suggests that in a modelfor this problem each of these four-tuples should have the same prob-ability, namely 104. Hence the probability to find the correct code is104.

(5) The letters A, B, C and D are randomly arranged in a line, each orderbeing equally likely. Then the probability to get the string ADCBis 1

4!= 1

24. The probability space for this example is made up by all

possible permutations of these letters,

= {ABCD,ABDC,ACBD,...,DCBA} .

(6) Suppose we have limited space and want to invite 20 out of our 100friends to a party. We cannot decide and do it randomly, but in afair manner (we do not prefer any combination). Given a fixed choice

of 20 friends, the probability that exactly these friends will be invitedois100

20

1. (This is a small number, maybe we better make a more

emotional, non-random choice ...) Here the probability space consistsof all combinations of 20 out of 100 elements, but is would be tedious towrite it down more explicitely. However, we know it has

100

20

elements.


16/32


(7) In a group there are 10 women and 10 men. We would like to form a

gender balanced team of four people. We do it in a fair manner andask for the probability that some fixed, preferred team we have in mindwill be the chosen one. The sample space could be thought of to consistof elements ((m1, m2), (w1, w2)), where (m1, m2) is a randomly chosencombination of two out of ten men, and (w1, w2) is a randomly chosencombination of 2 out of ten women. For each of these two choicesthere are

10

2

possibilities, hence our sample space will have

10

2

10

2

elements. The probability to choose the preferred team is

10

2

10

2

1.

Exercise 2.1.5. Why is it (strictly speaking) wrong to writeP(H) in Example2.1.5 (1) orP(1) in (2) ?

Please be careful. Some textbooks, encyclopedias or articles will neverthe-less write these wrong expressions. This is often done to have a simplified(short-hand) notation, but with the agreement (often written somewhere, butsometimes silent) that this is written willingly to replace the mathematicallycorrect expression.

Remark 2.1.3. The best way to solve modelling problems is: First, deter-mine the sample space and afterwards come up with a reasonable probabilitymeasure on it. Once the sample space is written correctly, the subsequentparts of the problem become much simpler.

Exercise 2.1.6. To design a new product 10 items have been produced, 8 ofthem are of high quality but 2 are defective. In a test we randomly pick twoof these 10 items. No item is preferred, and we do not place a drawn itemback. How likely is it that our random choice gives one high quality and onedefective item ?

Tue, Oct 15,

2013Lemma 2.1.2. (Inclusion-exclusion principle)Let(,P) be a discrete probability space and letA1, A2,...,An be events. Then

Pn

i=1

Ai =n

i=1

P(Ai) 1i1


17/32


This generalizes Lemma 2.1.1 (iii). For n = 3 and three events A,B,C

Lemma 2.1.2 gives

P(A B C) = P(A) + P(B) + P(C)

P(A B) P(A C) P(B C)

+ P(A B C).

Exercise 2.1.7. Draw the corresponding Venn diagram for n = 3.

Exercise 2.1.8. Review or study the basic idea of proof by induction.

Proof. We proceed by induction. For n = 1 there is nothing to prove, and for

n = 2 the statement is known to be true by Lemma 2.1.1 (iii). We assumeit is true for n, this is called the induction hypothesis. We use the inductionhypothesis to prove the statement of the lemma for n = 1. If we manageto do so, it must be true for all natural numbers n and all choices of eventsA1,...,An, as desired.

Given n + 1 events A1,...,An, An+1, we observe

P

n+1i=1

Ai

= P

ni=1

Ai

An+1

= Pn

i=1

Ai+ P(An+1) Pn

i=1

Ai An+1by Lemma 2.1.1 (iii). The distributivity rule of set operations tells

ni=1

Ai

An+1 =

ni=1

(Ai An+1),

and by induction hypothesis this event has probability

Pn

i=1

(Ai An+1) =n

i=1

P(Ai An+1) 1i1


18/32


Using the induction hypothesis once more, now on n

i=1 Ai, we obtain

P

n+1i=1

Ai

=

ni=1

P(Ai) + P(An+1)

1i1


19/32


chosen permutation : {1, 2,...,n} {1, 2,...,n} has a fixed point (i.e.

some elementi

{1,

2,...,n

} with

(i) =

i).It makes sense to set the sample space to be the space of all different

permutations of {1, 2,...,n}. It has || = n! elements.If Ei := { : (i) = i} denotes the event that i is a fixed point, then

P(Ei1 Eik) =(n k)!

n!

for all 1 i1 < i2 < < ik n. Therefore1i1


20/32


(ii) If (En)n=1 is a decreasing sequence of events, then

limn

P(An) = P

n=1

An

.

Proof. Tor see (i) set

B1 := A1, B2 := A2 \ B1, . . . , Bn := An \

n1i=1

Bi

, . . .

Note that the Bi are pairwise disjoint,

i=1 Bi =

i=1 Ai, andn

i=1 Bi =

ni=1 Ai = An for all n. Then

P

i=1

Ai

= P

i=1

Bi

=

i=1

P(Bi) = limn

ni=1

P(Bi) = limn

P

ni=1

Bi

= limn

P(An).

Statement (ii) is left as an exercise, just use complements.

2.2 Conditional probability and independence

Sometimes we will only have partial information about an experiment, for

instance, we may have to impose or assume certain conditions in order todiscuss statistical features. If these conditions are nonrandom, we can some-how ignore them by choosing an adequate model. But sometimes theseconditions themselves are random, i.e. varying with the outcome of the ex-periment, and we need an appropriate probabilistic model to reflect this.

Examples 2.2.1. Consider a group of one hundred adults, twenty are womenand eighty are men, fifteen of the women are employed and twenty of the menare employed. We randomly select a person and find out whether she or heis employed. If employed, what is the probability that the selected person isa woman ?

Without the information on employment, we would have selected a womanwith probability 20

100= 1

5.

Knowing the selected person is employed, we can pass to another samplespace, now consisting of the thirty five employed people, and obtain theprobability 15

35= 3

7to have selected a woman.


21/32

2.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 19

In general it is more flexible to keep th sample space fixed but to imple-

ment given information by conditioning.Definition 2.2.1. Let (,P) be a discrete probability space. Given twoevents A and B with P(B) > 0, the conditional probability of A given B isdefined as

P(A|B) :=P(A B)

P(B).

Examples 2.2.2. For the previous example, set

A := {a woman selected}

andB := {the selected person is employed}

to see P(A|B) = 37

.

The next Lemma is (almost) obvious.

Lemma 2.2.1. Let(,P) be a discrete probability space and B an event withP(B) > 0. ThenP(|B) is again a probability measure.

We haveP(B|B) = P(|B) = 1. Moreover, statements (i)-(vi) of Lemma2.1.1 and the statement of Lemma 2.1.2 remain valid forP(|B) in place ofP.

More specific rules for conditional probability are the following.

Lemma 2.2.2. Let(,P) be a discrete probability space.

(i) If A and B both are events with positive probability, then

P(B|A) =P(A|B)P(B)

P(A).

(ii) If B1, . . . , Bn are pairwise disjoint events of positive probability andsuch that =

n

i=1 Bi then we have

P(A) =ni=1

P(A|Bi)P(Bi)

for any event A.


22/32


(iv) IfB1, B2, . . . are pairwise disjoint events of positive probability and such

that =

i=1Bi then we have

P(A) =i=1

P(A|Bi)P(Bi)

for any event A.

(iv) For any events A1, . . . , An withPn1

i=1 Ai

> 0 we have

P

ni=1

Ai

= P(A1)P(A2|A1)P(A3|A2 A1) P

An|

n1i=1

Ai

.

Exercise 2.2.1. Prove Lemma 2.2.2.

(iii) and (iv) are called the law of total probability, (iv) is referred to asBayes rule.

Examples 2.2.3. We consider a medical test. Patients are being tested fora disease. For a patient sick with the disease the test will be positive in 99%of all cases. In 2% of all cases a healthy patient is tested positive. Statisticaldata show that one out of thousand patients really gets sich with the disease.We would like to know the probability that a tested patient indeed is sick.

We write S for the event {patient is sick}, + for {patient is tested positive}

and for {patient is tested negative}. We know that

P(S) = 0.001, P(+|S) = 0.99 und P(+|Sc) = 0.02.

We are looking for P(S|+). By the law of total probability ,

P(+) = P(+|S)P(S) + P(+|Sc)P(Sc),

and therefore

P(S|+) =P(S +)

P(+)=P(+|S)P(S)

P(+)

=

0.99 0.001

0.99 0.001 + 0.02 0.999 ,

which is approximately 120

. This suggests that while the test may give someindication and serve as a first diagnostic tool, it is too inaccurate to allowthe conclusion of a diagnosis without performing further examinations.


23/32

2.2. CONDITIONAL PROBABILITY AND INDEPENDENCE 21

Exercise 2.2.2. About 5% of all men and 1% of all women suffer from

dichromatism. Suppose that 60% of a group are women. If a person israndomly selected from that group, what is the probability that this individualsuffers from dichromatism ?

In our experiment or observation it may happen that the relative fre-quence of the occurence of some event A is not affected whether some othergiven event B occurs or not. In other words, ifn is the total number of trialsand nA denotes the number of trials such that A occurs, we observe that thenumbers

nA

nand

nAB

nB

are close to each other, at least for large n. A heuristic rearrangement ofthis relation is to say that the numbers

nAB

nand

nA

n

nB

n

are close. In the mathematical model this is formalized by the notion ofindependence.

Definition 2.2.2. Let (,P) be a discrete probability space. Two events Aand B are called independent if

P(A B) = P(A)P(B).

Events A1,...,An are called independent if

P(Aj1 Ajk) =ki=1

P(Aji)

for all distinct j1,...,jk {1, . . . , n}. The events A1, . . . , An are called pair-wise independent if for any two different j, k {1, . . . , n} the two events Ajand Ak are independent.

(At first glance independence looks like some algebraic relation. In a

sense, this intuition is not all that wrong ... maybe at some point you willencounter or have encountered product measures, characteristic functionsand groups characters, etc.)

Examples 2.2.4. Three events A, B and C are independent if


24/32


1. P(A B C) = P(A)P(B)P(C),

2. P(A B) = P(A)P(B),

3. P(A C) = P(A)P(C) and

4. P(B C) = P(B)P(C).

Obviously independence implies pairwise independence. In general theconverse is false.

Exercise 2.2.3. Give an example of three events A,B,C that are pairwiseindependent but not independent.

Exercise 2.2.4. Verify that if two events A andB are independent then alsoA and Bc are independent and Ac and Bc are independent.

Whether in a mathematical model two events are independent or not isa consequence of the choice of the probability measure.

Examples 2.2.5. We toss two coins, not necessarily fair. A suitable samplespace is

:= {H, T}2 = {(1, 2) : i {H, T} , i = 1, 2} .

For a first model, let p, p (0, 1) and put

P({(H, H)}) = pp,

P({(T, H)}) = (1 p)p,

P({(H, T)}) = p(1 p),

P({(T, T)}) = (1 p)(1 p).

(2.1)

Then the events {1 = H} and {2 = H} are independent under P with

P({1 = H}) = p and P({2 = H}) = p. (2.2)

Conversely, it is not difficult to see that if we require these events to beindependent under a probability measure P and to have (2.2), then P has tobe as in (2.1).

For a second model, let the probability to get heads in the first trial bep (0, 1). If the first trial gives heads, then let the probability to get heads


25/32

2.3. DISCRETE RANDOM VARIABLES 23

in the second trial be p (0, 1), otherwise let it be q (0, 1) with q = p.

For a probability measureQ

on satisfying these ramifications we must haveQ({1 = H}) = p,

Q({2 = H} | {1 = H}) = p,

Q({2 = H} | {1 = T}) = q.

This yields

Q({1 = H} {2 = H}) = Q({2 = H} | {1 = H})Q({1 = H})

= pp.

But on the other hand,

Q({2 = H}) = Q({2 = H} | {1 = H})Q({1 = H})

+Q({2 = H} | {1 = H})Q({1 = T})

= pp + q(1 p),

and therefore

Q({1 = H})Q({2 = H}) = p(pp + q(1 p))

= Q({1 = H} {2 = H}),

i.e. the events {1 = H} and {2 = H} are not independent under Q.

2.3 Discrete random variables

When performing and observing an experiment it is often useful to filteror rearrange information or to change perspective. For instance, we mightmeasure a temperature, viewed as random outcome of the experiment, andwant to calculate a reaction intensity that depends on the given temperature.Then this reaction intensity itself will be random. This, more or less, is the

concept of a random variable, or more generally, a random element.

Definition 2.3.1. Let (,P) be a discrete probability space and E = . Afunction X : E is called a random element with values in E. A functionX : R is called a random variable.


26/32


A random variable is a random element with values in E = R.

Notation: We agree to write

{X B} := { : X() B}

for any random element X with values in E and any E X, and in thespecial case B = {x} also

{X = x} := { : X() = x} .

Similarly, we agree to write

P(X B) := P ({X B}) ,

and in the special case B = {x} also

P(X = x) := P ({X = x}) .

These abbreviations are customary.

Examples 2.3.1. Given an event A , set

1A() :=

1 if A

0 if Ac.

This defines a random variable 1A on , usually referred to as the indicatorfunction of the event A. Note that {1A = 1} = A, {1A = 0} = Ac and{1A = x} = for any x {0, 1}

c.

Since is discrete, any random element X on with values in a setE can have at most a countable number of different values x E. If weenumerate these countably many different values by {xj}

j=1

, then the events

{X = xj} are pairwise disjoint and =j=1 {X = xj}. If X attains only

finitely many different values then of course the same is true with a finitenumber n in place of .

If X is a random variable, we may rewrite it as

X =

i=1xj1{X=xj},

where 1{X=xj} is the indicator function of the event {X = xj}.

2.4 Bernoulli trials


27/32

Chapter 3

Absolutely continuous

distributions

25


28/32

Chapter 4

Measure theory and Lebesgue

integration

26


29/32

Chapter 5

Product spaces and

independence

27


30/32

Chapter 6

Weak convergence and the

Central Limit Theorem

28


31/32

Chapter 7

Conditioning

29


32/32

Chapter 8

Martingale theory

30

Skript Probability Oct12 2013

Documents

Transcript of Skript Probability Oct12 2013