Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I...

74
Basic Statistics for SGPE Students Part II: Probability distribution 1 Nicolai Vitt [email protected] University of Edinburgh September 2019 1 Thanks to Achim Ahrens, Anna Babloyan and Erkal Ersoy for creating these slides and allowing me to use them.

Transcript of Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I...

Page 1: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Basic Statistics for SGPE Students

Part II: Probability distribution1

Nicolai [email protected]

University of Edinburgh

September 2019

1Thanks to Achim Ahrens, Anna Babloyan and Erkal Ersoy for creatingthese slides and allowing me to use them.

Page 2: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Outline1. Probability theory

I Conditional probabilities and independenceI Bayes’ theorem

2. Probability distributionsI Discrete and continuous probability functionsI Probability density function & cumulative distribution functionI Binomial, Poisson and Normal distributionI E[X] and V[X]

3. Descriptive statisticsI Sample statistics (mean, variance, percentiles)I Graphs (box plot, histogram)I Data transformations (log transformation, unit of measure)I Correlation vs. Causation

4. Statistical inferenceI Population vs. sampleI Law of large numbersI Central limit theoremI Confidence intervalsI Hypothesis testing and p-values

1 / 61

Page 3: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Random variablesMost of the outcomes or events we have considered so far havebeen non-numerical, e.g. either head or tail. If the outcome of anexperiment is numerical, we call the variable that is determined bythe experiment a random variable.

Random variables may be either discrete (e.g. the number of daysthe sun shines) or continuous (e.g. your salary after graduatingfrom the MSc). In contrast to a continuous random variable, wecan list the distinct potential outcomes of a discrete randomvariable.

NotationRandom variables are usually denoted by capital letters, e.g. X .The corresponding realisations are denote by small letters, e.g. x.

2 / 61

Page 4: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Should you make the bet?Example III.1I propose the following game. We toss a fair coin 10 times. If headappears 4 times or less, I pay you £2. If head appears more than 4 times,you pay me £1. Should you make the bet?

Let’s try to formalise the problem. Let the random variablesX1,X2, . . . ,X10 be defined such that

Xi ={

1 if head appears on the ith toss0 if tail appears on the ith toss for i = 1, . . . , 10.

3 / 61

Page 5: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Should you make the bet?Furthermore, let the random variable Y denote the number of heads.Clearly,

Y = X1 + X2 + . . .+ X10.

If the realisation of Y is greater than 4, I win.

Let P(Y = y) denote the probability that Y takes the value y.Accordingly, P(Y ≤ 4) is the probability that we obtain 4 or less headsand P(Y > 4) is the probability that we obtain more than 4 heads.When would you make the bet?

Your expected value isE[V ] = P(Y ≤ 4) ·£2 + P(Y > 4) · (£−1)

where V is the money you get. If E[V ] > 0 (and you are risk neutral),you’ll choose to play.

4 / 61

Page 6: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Should you make the bet?

Expected valueThe expected value of a discrete random variable X is denoted by E[X ]and given by

E[X ] = x1P(X=x1) + x2P(X=x2) + · · ·+ xkP(X=xk) =k∑

i=1xiP(X=xi)

where k is the number of distinct outcomes.

5 / 61

Page 7: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Should you make the bet?To solve the problem, we need to find P(Y ≤ 4) and P(Y > 4).From the additive law (Rule 4), we know that

P(Y≤4) = P(Y =0 ∪Y =1 ∪Y =2 ∪Y =3 ∪Y =4)= P(Y =0) + P(Y =1) + P(Y =2) + P(Y =3) + P(Y =4)

P(Y>4) = P(Y =5) + P(Y =6) + P(Y =7) + P(Y =8) + P(Y =9) + P(Y =10)

Hence, we need to find P(Y = yi) for i = 0, . . . , 10.

6 / 61

Page 8: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Discrete probability distributionIt is common to denote the probability distribution of a discrete randomvariable Y by f (y).

Discrete probability distributionThe probability distribution or probability mass function of a discreterandom variable X associates with each of the distinct potentialoutcomes xi (i = 1, . . . , k) a probability P(X = xi). That is,

f (xi) = P(X = xi).The sum of the probabilities add up to 1, i.e.

∑ki f (xi) = 1.

7 / 61

Page 9: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Discrete probability distributionTwo examples:

Example III.2 (Discrete Uniform Distribution)

Let X be the result from rolling a fair dice. The probability distribution issimply

f (x) = P(X = x) ={

1/6 for x = {1, 2, . . . , 6}0 otherwise .

This probability distribution is an example for a discrete uniform distributions.

Bernoulli distributionIt is said that a random variable X has a Bernoulli distribution with parameterP(X = 1) = p (i.e. probability of success) if X can take only the values 1(success) and 0 (failure). The probability distribution is given by

f (x) =

{ p if x = 11− p if x = 00 otherwise

8 / 61

Page 10: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionLet’s start with f (0) = P(Y =0) which is the probability of obtaining noheads. Using the multiplicative law,

P(Y =0) = P(X1=0)P(X2=0) . . .P(X10=0) = (1/2)10 = 0.00097656

Now, f (1) = P(Y =1). Since we are interested in the number of heads,we have to take into account that there is more than one combinationthat results in 1 head.

P(Y =1) = P(X1=1)P(X2=0) . . .P(X10=0)+ P(X1=0)P(X2=1) . . .P(X10=0)

...+ P(X1=0)P(X2=0) . . .P(X10=1)= 10 · (1/2)10 = 0.00976563

9 / 61

Page 11: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H H T T T T T T T T2 H T H T T T T T T T3 H T T H T T T T T T4 H T T T H T T T T T5 H T T T T H T T T T6 H T T T T T H T T T7 H T T T T T T H T T8 H T T T T T T T H T9 H T T T T T T T T H

10 / 61

Page 12: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H H T T T T T T T T2 T H H T T T T T T T3 T H T H T T T T T T4 T H T T H T T T T T5 T H T T T H T T T T6 T H T T T T H T T T7 T H T T T T T H T T8 T H T T T T T T H T9 T H T T T T T T T H

10 / 61

Page 13: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T H T T T T T T T2 T H H T T T T T T T3 T T H H T T T T T T4 T T H T H T T T T T5 T T H T T H T T T T6 T T H T T T H T T T7 T T H T T T T H T T8 T T H T T T T T H T9 T T H T T T T T T H

10 / 61

Page 14: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T T H T T T T T T2 T H T H T T T T T T3 T T H H T T T T T T4 T T T H H T T T T T5 T T T H T H T T T T6 T T T H T T H T T T7 T T T H T T T H T T8 T T T H T T T T H T9 T T T H T T T T T H

10 / 61

Page 15: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T T T H T T T T T2 T H T T H T T T T T3 T T H T H T T T T T4 T T T H H T T T T T5 T T T T H H T T T T6 T T T T H T H T T T7 T T T T H T T H T T8 T T T T H T T T H T9 T T T T H T T T T H

10 / 61

Page 16: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T T T T H T T T T2 T H T T T H T T T T3 T T H T T H T T T T4 T T T H T H T T T T5 T T T T H H T T T T6 T T T T T H H T T T7 T T T T T H T H T T8 T T T T T H T T H T9 T T T T T H T T T H

10 / 61

Page 17: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T T T T T H T T T2 T H T T T T H T T T3 T T H T T T H T T T4 T T T H T T H T T T5 T T T T H T H T T T6 T T T T T H H T T T7 T T T T T T H H T T8 T T T T T T H T H T9 T T T T T T H T T H

10 / 61

Page 18: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T T T T T T H T T2 T H T T T T T H T T3 T T H T T T T H T T4 T T T H T T T H T T5 T T T T H T T H T T6 T T T T T H T H T T7 T T T T T T H H T T8 T T T T T T T H H T9 T T T T T T T H T H

10 / 61

Page 19: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T T T T T T T H T2 T H T T T T T T H T3 T T H T T T T T H T4 T T T H T T T T H T5 T T T T H T T T H T6 T T T T T H T T H T7 T T T T T T H T H T8 T T T T T T T H H T9 T T T T T T T T H H

10 / 61

Page 20: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionNow, f (2) = P(Y =2). How many combinations are there that yield2 heads out of 10 tosses? Given that the first toss produces a head, thereare 9 combinations that yield two heads in total. And so on...

toss1 2 3 4 5 6 7 8 9 10

com

bina

tion

1 H T T T T T T T T H2 T H T T T T T T T H3 T T H T T T T T T H4 T T T H T T T T T H5 T T T T H T T T T H6 T T T T T H T T T H7 T T T T T T H T T H8 T T T T T T T H T H9 T T T T T T T T H H

This gives us 10 · 9. This approach has the problem of double counting.Each combination appears twice. So we have to divide by 2 and get(10 · 9)/2 distinct combinations. Thus,

P(Y = 2) = 10 · 92

(12

)10= 0.04394531.

10 / 61

Page 21: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionFor P(Y = 3),P(Y = 4), . . . this gets even more complicated.

Binomial coefficientSuppose that there is a set of n distinct elements from which it is desiredto choose a subset of k elements (typically 1 ≤ k ≤ n). The binomialcoefficient gives the number of ways k elements can be selected from nelements. The binomial coefficient is defined as

Cn,k = C nk =

(nk

)= n!

k!(n − k)! .

where k! = k(k − 1)(k − 2) . . . 1 and 0! = 1.

RemarkNote that

n!(n − k)!

= n · (n − 1) · (n − 2) · . . . · (n − k + 1).

For example,7!

(7− 3)!=

7!4!

=7 · 6 · 5 · 4 · 3 · 2 · 1

4 · 3 · 2 · 1= 7 · 6 · 5.

11 / 61

Page 22: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionLet’s consider another example to get a better understanding of thebinomial coefficient.Example III.3Imagine a box with four distinct elements (n = 4) denoted as a, b, c, d.We want to randomly pick two elements (k = 2). If the order of selectingelements matters, there exist 4 · 3 different combinations. However, wedon’t want the order to matter, so we divide by 2, as there are two waysof ordering two elements ({b, a} and {a, b}). Therefore, there are

4 · 32 = 4!

2!(4− 2)! =(

42

)= 6

different combinations.{a, b} {a, c} {a, d} {b, c} {b, d} {c, d}.

12 / 61

Page 23: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Combination vs. permutationExample III.3Imagine a box with four distinct elements (n = 4) denoted as a, b, c, d.We want to randomly pick two elements (k = 2). If the order of selectingelements matters, there exist 4 · 3 different combinations. However, wedon’t want the order to matter, so we divide by 2, as there are two waysof ordering two elements ({b, a} and {a, b}). Therefore, there are

4 · 32 = 4!

2!(4− 2)! =(

42

)= 6

different combinations.{a, b} {a, c} {a, d} {b, c} {b, d} {c, d}.

Note the distinction between permutation (order matters) andcombination (order does not matter).

If order matters (e.g. we distinguish between {a, b} and {b, a}), thesolution to the above problem is simply 4 · 3 = 12.

13 / 61

Page 24: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial coefficient & binomial distributionBack to our problem: For P(Y = 3),

f (3) = P(Y = 3) =(

103

)(12

)10= 10!

3!(10− 3)!

(12

)10

= 10 · 9 · 83 · 2 · 1

(12

)10= 0.1171875.

The binomial coefficient allows us to find a general expression for f (y).

Binomial distributionIf the random variables X1, . . . ,Xn form n Bernoulli trials with parameterp (i.e. probability of success), then Y = X1 + · · ·+ Xn follows abinomial distribution. The binomial distribution is given by

f (y; n, p) =(

ny

)py(1− p)n−y

for y = 0, 1, . . . ,n.

14 / 61

Page 25: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distributionWe now know the specific functional form of f (y) = P(Y = y). Hence,we can obtain the probability that we draw 0, 1, 2, . . . , 10 heads.

y f (y)0 0.000981 0.009772 0.043953 0.117194 0.205085 0.246096 0.205087 0.117198 0.043959 0.0097710 0.00098

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.5)

y

f(y)

15 / 61

Page 26: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Cumulative distribution functionHowever, we are interested in P(Y ≤ 4).

Cumulative distribution functionThe cumulative distribution function of a discrete random variable X isdenoted by F(x) and is defined as

F(x) = P(X ≤ x)where −∞ ≤ x ≤ +∞. The cumulative distribution function F(x) givesthe probability that the outcome of X in a random trial will be less thanor equal to any specified value x.

16 / 61

Page 27: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distribution

y f (y) F(y)0 0.00098 0.000981 0.00977 0.010742 0.04395 0.054693 0.11719 0.171884 0.20508 0.376955 0.24609 0.623056 0.20508 0.828137 0.11719 0.945318 0.04395 0.989269 0.00977 0.9990210 0.00098 1.00000

0 1 2 3 4 5 6 7 8 9 10

0

0.2

0.4

0.6

0.8

1

y

F(y)

Cumulative distribution function

For example,F(2) = f (0) + f (1) + f (2) = 0.00098 + 0.00977 + 0.04395 = 0.05469.

17 / 61

Page 28: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Should you make the bet?

Example III.1 (continued)

I propose the following game. We toss a fair coin 10 times. If headappears 4 times or less, I pay you £2. If head appears more than 4 times,you pay me £1. Should you make the bet?

We can finally solve the problem. Your expected value is

E[V ] = P(Y ≤ 4) ·£2 + P(Y > 4) · (£− 1)= F(4) ·£2 + (1− F(4)) · (£− 1)= 0.377 ·£2 + 0.623 · (£− 1) ≈ £0.131.

You should make the bet (if you are risk neutral)!

What does E[V ] mean? If we repeat the game an infinite number oftimes, your average payoff will be £0.131.

18 / 61

Page 29: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distribution (Simulation)Suppose that we play the game m times. That is, we toss the coin 10 times,write down the number of heads, and play again.

Let’s set m = 20. We get: 3 7 5 2 3 3 2 5 3 7 4 4 4 4 5 3 4 5 4 2.y 0 1 2 3 4 5 6 7 8 9 10frequency 0 0 3 5 6 4 0 2 0 0 0rel. frequency 0.00 0.00 0.15 0.25 0.30 0.20 0.00 0.10 0.00 0.00 0.00

0.0

0.1

0.2

0.3

0.4

Empirical binomial distribution (n=10,p=0.5)20 repetitions

01

f(y)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.5)

y

f(y)

19 / 61

Page 30: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distribution (Simulation)Suppose that we play the game m times. That is, we toss the coin 10 times,write down the number of heads, and play again.

Let’s set m = 50.y 0 1 2 3 4 5 6 7 8 9 10frequency 0 0 3 7 11 8 7 7 4 3 0rel. frequency 0.00 0.00 0.06 0.14 0.22 0.16 0.14 0.14 0.08 0.06 0.00

0.0

0.1

0.2

0.3

0.4

Empirical binomial distribution (n=10,p=0.5)50 repetitions

y

f(y)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.5)

y

f(y)

19 / 61

Page 31: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distribution (Simulation)Suppose that we play the game m times. That is, we toss the coin 10 times,write down the number of heads, and play again.

Let’s set m = 100.y 0 1 2 3 4 5 6 7 8 9 10frequency 0 2 3 17 26 22 14 11 3 2 0rel. frequency 0.00 0.02 0.03 0.17 0.26 0.22 0.14 0.11 0.03 0.02 0.00

0.0

0.1

0.2

0.3

0.4

Empirical binomial distribution (n=10,p=0.5)100 repetitions

y

f(y)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.5)

y

f(y)

19 / 61

Page 32: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distribution (Simulation)Suppose that we play the game m times. That is, we toss the coin 10 times,write down the number of heads, and play again.

Let’s set m = 10,000.y 0 1 2 3 4 5 6 7 8 9 10frequency 4 89 429 1171 2045 2470 2075 1198 411 103 5rel. frequency 0.0004 0.0089 0.0429 0.1171 0.2045 0.2470 0.2075 0.1198 0.0411 0.0103 0.0005

0.0

0.1

0.2

0.3

0.4

Empirical binomial distribution (n=10,p=0.5)10000 repetitions

y

f(y)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.5)

y

f(y)

19 / 61

Page 33: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distributionThe binomial distribution has two parameters: n is the number of(Bernoulli) trials and p is the probability of success in each trial. Howdoes the distribution look like for different values of n and p?

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.5)

y

f(y)

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.7)

y

f(y)

0 1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

Binomial distribution (n=10,p=0.9)

y

f(y)

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

Binomial distribution (n=30,p=0.5)

y

f(y)

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

Binomial distribution (n=30,p=0.7)

y

f(y)

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

Binomial distribution (n=30,p=0.9)

y

f(y)

20 / 61

Page 34: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distributionExpected value and variance

In the same way we summarise an observed dataset by the sample averageand the sample variance (or standard deviation), we can characterise aprobability distribution by its expected value and its variance. From thefigures we can see that expected value and variance change with n and p.

To find the expected value of Y , note that:

Linearity of expectationIf Y is the sum of random variables X1,X2, . . . ,Xn , then:

E[Y ] = E

[ n∑i=1

Xi

]=

n∑i=1

E[Xi].

Furthermore, if c is a constant (i.e., non-random) and X a randomvariable, then

E[X + c] = E[X ] + cE[cX ] = cE[X ].

21 / 61

Page 35: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distributionExpected value and variance

Recall that Y = X1 + X2 + X3 + · · ·+ Xn. Therefore,

E[Y ] = E[X1] + E[X2] + E[X3] + · · ·+ E[Xn]

Recall that Xi follows a Bernoulli distribution. The expected value of aBernoulli variable is

E[Xi] = p · 1 + (1− p) · 0 = p.

Therefore,

E[Y ] = E[X1] + E[X2] + E[X3] + . . .+ E[Xn] = np.

22 / 61

Page 36: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Binomial distributionExpected value and variance

VarianceThe variance of a discrete random variable X is denoted by V[X ] andgiven by

V[X ] =k∑i

(xi − E[X ])2P(X = xi).

Variance of the sum of uncorrelated random variablesIf Y is the sum of independent (!) random variables X1,X2, . . . ,Xn,then:

V[Y ] = V

[ n∑i=1

Xi

]=

n∑i=1

V[Xi]

The variance of Xi is given byV[X ] = (1− E[X ])2p + (0− E[X ])2(1− p) = p(1− p)

ThereforeV[Y ] = np(1− p) 23 / 61

Page 37: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Poisson distributionExample III.4Let X be the number of cars passing by in an hour. On average λ carspass by. What is the probability that 3 cars pass by?

To simplify the problem, we divide each hour into 60 minutes. SinceE[X ] = λ, the probability that one car passes by in any particular minuteis λ/60.

Using this simplification, we can work with the binomial distribution.

f (3) = P(X = 3) ≈(

603

)(λ

60

)3(1− λ

60

)60−3

However, this approach does not take into account that more than onecar may pass by in a minute.

24 / 61

Page 38: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Poisson distributionWe can address this problem by dividing each hour into 3,600 seconds,3,600,000 milliseconds, and so on. More general, with n being thenumber of units we divide the hour into:

f (x) = P(X = x) ≈(

nx

)(λ

n

)x (1− λ

n

)n−x

Let n →∞, do some maths and you’ll arrive at:

Poisson distributionIf X follows a poisson process, then

f (x) = P(X = x) = e−λλx

x!for x = 0, 1, 2, 3, . . . ,∞. Note that e = limn→∞

(1 + 1

n)n = 2.718 . . . .

25 / 61

Page 39: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Poisson distribution

0 5 10 15 20

0.0

0.1

0.2

0.3

Poisson distributionlambda=1

x

f(x)

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

Poisson distributionlambda=3

x

f(x)

The poisson distribution is asymmetric (or right-skewed) forE[X ] = λ = 1 and λ = 3

26 / 61

Page 40: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Poisson distribution

0 5 10 15 20

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Poisson distributionlambda=10

x

f(x)

850 900 950 1000 1050 1100 1150

0.00

00.

002

0.00

40.

006

0.00

80.

010

0.01

2

Poisson distributionlambda=1000

x

f(x)

The higher λ, the more symmetric is the poisson distribution. Also, thedistribution looks very similar to the normal distribution!

27 / 61

Page 41: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Continuous distributionsProbability density function (PDF)

If the random variable X is continuous, we use f (x) to denote theprobability density function (PDF) of X . The PDF satisfies tworequirements:

(1) f (x) ≥ 0 and (2)∫ +∞

−∞f (x)dx = 1

Remark (!)

If the random variable X is continuous, then the probability that X takesa particular value x is zero. That is,

f (x) 6= P(X = x) = 0.

28 / 61

Page 42: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Continuous distributionsUniform distributionLet a and b be two real numbers (a < b) and consider an experimentwhere a number X is randomly selected from the interval [a, b]. If theprobability that X belongs to any subinterval of [a, b] is proportional tothe length of the subinterval, we say that X is uniformly distributed. ThePDF of X is given by

f (x) ={ 1

b−a for x ∈ [a, b]0 otherwise

We write that X ∼ u(a, b).

29 / 61

Page 43: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Continuous distributionsExample III.5Anna and Achim arrange to meet ‘between 1pm and 2pm’ at Old College.Their arrival time is uniformly distributed and they arrive independently of eachother. What is the probability that no one will have to wait more than 15minutes?

Let X denote Anna’s arrival time and Y denote Achim’s arrival time.I Express the event ‘no one will wait for more than 15 minutes’ in

terms of X and Y .I What is the joint distribution of X and Y ?

30 / 61

Page 44: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Continuous distributionsThe normal distribution is by far the single most important probabilitydistribution. Many natural phenomena are (approximately) normallydistributed. Another reason for its importance comes from the centrallimit theorem (to be discussed in the next lecture).

Normal distributionIf the random variable X follows normal distribution with mean µ(−∞ < µ <∞) and variance σ2 (σ > 0), its PDF is given by

f (x) = 1√σ22π

e−(x−µ)2

2σ2

for −∞ < x <∞. We write that X ∼ N (µ, σ2).

31 / 61

Page 45: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Continuous distributions

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

x

f(x

)

N(0, 1)

−4 −2 0 2 40

0.2

0.4

0.6

0.8

1

x

f(x

)

u(1, 3)

u(−3, 0)

32 / 61

Page 46: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

E[X ] for continuous distributionsExpected valueThe expected value of a continuous random variable is

E[X ] =∫ +∞

−∞xf (x)dx.

E[X ] is the balance point of the probability mass. The probability mass tothe left of E[X ] is in balance to the probability mass on the right of E[X ].

I

33 / 61

Page 47: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

E[X ] for continuous distributionsThus, the expected value of the normal distribution is simply at itshighest point (due to symmetry) and the expected value of auniform distribution is half way between a and b.

I Ia b

34 / 61

Page 48: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

E[X ] for the uniform distributionLet’s do this formally for the uniform distribution.

E[X ] =∫ +∞

−∞xf (x)dx (1)

=∫ +∞

−∞x 1

b − a dx (2)

= 1b − a

∫ b

axdx (3)

= 1b − a

[12x2

]b

a(4)

= 1b − a

12(b2 − a2

)(5)

= 12

(b2 − a2)(b + a)(b − a)(b + a) (6)

= 12(b + a) (7)

35 / 61

Page 49: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

E[X ] for the uniform distribution

(1) By the definition of the expected value.(2) By the definition of the uniform distribution.(3) If k is a constant, then∫

kf (x)dx = k∫

f (x)dx.

(4) Since ∫xndx = 1

n + 1xn+1 + c (n 6= −1).

(4-5) Since ∫ b

af (x)dx =

[F(x)

]b

a= F(b)− F(a)

where ddx F(x) = f (x).

36 / 61

Page 50: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

V[X ] for continuous distributions

VarianceThe variance of a continuous random variable X is

V[X ] =∫ +∞

−∞(x − E[X ])2f (x)dx. (8)

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

x

f(x

)

N(0, 1)

N(0, 2)

N(0, 3)

37 / 61

Page 51: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

V[X ] for uniform distributionInstead of using the definition above, we can make use of thefollowing:

Variance

V[X ] = E[(X − E[X ])2

]= E

[X2 − 2XE[X ] + (E[X ])2

]= E[X2]− 2(E[X ])2 + (E[X ])2

= E[X2]− (E[X ])2

where the third line uses the fact that E[A + B] = E[A] + E[B]and E[E[A]] = E[A].

We know E[X ]. Hence, we only need to find E[X2].

38 / 61

Page 52: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

V[X ] for uniform distributionWe treat X2 as a new random variable which follows the same PDF as X .

E[X2] =∫ +∞

−∞x2f (x)dx

= 1b − a

∫ b

ax2dx

= 1b − a

[13x3

]b

a

= 1b − a

13(b3 − a3)

= 13

(b3 − a3)(a2 + ab + b2)(b − a)(a2 + ab + b2)

= 13 (a2 + ab + b2)

V [X ] = E[X2]−(E[X ])2 = 13 (a2 +ab+b2)−

(12 (b + a)

)2= 1

12 (b−a)2

39 / 61

Page 53: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

PDF and probabilityConsider the standard normal distribution depicted in the figure. As youcan see, f (0) ≈ 0.4. To be precise, f (0) = 0.3989423 . . . . It is importantto understand that this does not mean that P(X = 0) = 0.3989423 . . . !If a random variable is continuous, there are infinite distinct values thatthe random variable can take. Thus, the probability that the randomvariable takes a specific value is zero.

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

x

f(x

)

N(0, 1)

40 / 61

Page 54: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

PDF and probabilityHowever, we can say that the probability that X is below 0 is equal tothe shaded gray area. Since we know that the area under f (x) is 1, weknow (due to symmetry) that P(X ≤ 0) = 0.5.

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

x

f(x

)N(0, 1)

41 / 61

Page 55: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

PDF and probabilityBut what is the probability that, say, P(X ≤ −1)?

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

−1

x

f(x

)N(0, 1)

42 / 61

Page 56: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

CDF and probabilityCumulative density functionThe cumulative density function (CDF) of a continuous random variableX is denoted by F(x) and given by

F(x) = P(X ≤ x) =∫ x

−∞f (u)du for −∞ ≤ x ≤ +∞.

The CDF gives the probability that the outcome of X in a randomexperiment is less than or equal to x.

43 / 61

Page 57: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

CDF and probability

We can read from the CDF that

F(−1) = P(X ≤ −1) ≈ 0.159 . . .

and

F(0) = P(X ≤ 0) = 0.5.

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

−1

f(x

)

N(0, 1)

−4 −2 0 2 40

0.2

0.4

0.6

0.8

1

1.2

−1

0 .159 ..

0 .50 .5

x

F(x

)

N(0, 1)

44 / 61

Page 58: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

CDF and probability

What is the probability that Xlies between −1 and 0? That is,what is

P(−1 ≤ X ≤ 0)?

It is simply

F(0)−F(−1) = 0.5−0.159 ≈ 0.341.

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

−1

f(x

)

N(0, 1)

−4 −2 0 2 40

0.2

0.4

0.6

0.8

1

1.2

−1

0 .159 ..

0 .50 .5

x

F(x

)

N(0, 1)

45 / 61

Page 59: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

CDF and probability

What is the probability that Xis below +1? Due to symmetry

F(1) = 1− F(−1).

Thus,

1−F(−1) = 1−0.159 ≈ 0.841.

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

−1 1

f(x

)

N(0, 1)

−4 −2 0 2 40

0.2

0.4

0.6

0.8

1

1.2

−1 1

0 .159 ..

0 .5

0 .841 ..

0 .5

x

F(x

)

N(0, 1)

46 / 61

Page 60: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Inverse functions and CDFWe will often use the inverse function of the CDF.Inverse functionIn general, if g(x) is an invertible function, then the inverse function isgiven by

g−1(g(x)) = x.

Intuition: A function works like a machine. It takes x as an input andreturns the output g(x) = a. An inverse function works the other wayaround: g−1(a) = x.

Suppose we are interested in the following question: What is the value ofx such that P(X ≤ x) = 0.95.

47 / 61

Page 61: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Inverse functions and CDF

Suppose we are interested in thefollowing question: What is thevalue of x such that P(X ≤x) = 0.95. Using the inverseCDF:

F−1(0.95) ≈ 1.64 . . .

This implies that, due to sym-metry, 90% of the probabilitymass is approximately in the±1.64 interval.

−4 −2 0 2 40

0.1

0.2

0.3

0.4

0.5

−1 1 1 .64

f(x

)

N(0, 1)

−4 −2 0 2 40

0.2

0.4

0.6

0.8

1

1.2

−1 1 1 .64

0 .95

x

F(x

)

N(0, 1)

48 / 61

Page 62: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Standard normal distributionStandard normal distributionIf X ∼ N (µ, σ2), then

Z = X − µσ

∼ N (0, 1).We say that Z follows a standard normal distribution. The PDF and CDFof the standard normal distribution are often denoted by φ(z) and Φ(z).

0 5 10 150

0.1

0.2

0.3

0.4

0.5

f(x

)

Z ∼ N(0, 1)

X ∼ N(10, 4)

49 / 61

Page 63: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Standard normal distributionExample III.6Suppose X ∼ N (10, 4) and Z = (X − 10)/

√4 ∼ N (0, 1). What is P(X ≤ 8)?

P(X ≤ 8) = P(X − µ

σ≤ 8− µ

σ

)= P

(X − 10√

4≤ 8− 10√

4

)= P (Z ≤ −1)

0 5 10 150

0.1

0.2

0.3

0.4

0.5

−1 8

f(x

)

Z ∼ N(0, 1)

X ∼ N(10, 4)

50 / 61

Page 64: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Multivariate distributions (discrete)Joint probability functionThe joint probability function of two discrete random variables X and Y isgiven by

f (x, y) = P(X = x and Y = y) and∑

i

∑j

f (xi , yj) = 1.

Table of Probabilities

X\Y 1 2 3 41 0.1 0 0.1 02 0.3 0 0.1 0.23 0 0.2 0 04 0 0 0 0

x

y

f(x, y)

0.1

0.2

0.3

0.4

12

34

12

34

51 / 61

Page 65: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Multivariate distributions (continuous)Joint probability density functionThe joint probability density function (or joint PDF) of two continuous randomvariables X and Y is given by

f (x, y) and∫ +∞

−∞

∫ +∞

−∞f (x, y)dxdy = 1.

−1

−0.50

0.5

1

1.5

2

2.5

3 0

0.5

1

1.5

2

2.5

3

3.5

40

0.1

x y

f(x,y)

52 / 61

Page 66: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Multivariate distributions (continuous)Joint probability density functionThe joint probability density function of two continuous random variables Xand Y is given by

f (x, y) and∫ +∞

−∞

∫ +∞

−∞f (x, y)dxdy = 1.

Example III.7If X and Y have joint PDF f (x, y) then the probability that X lies between 0and 2 and that, at the same time, Y lies between 0 and 1 is

P(0 ≤ X ≤ 2 and 0 ≤ Y ≤ 1) =∫ 1

0

∫ 2

0f (x, y)dxdy

53 / 61

Page 67: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Marginal distributionsMarginal probability function (discrete)

If X and Y are two discrete random variables for which the joint probabilityfunction is f (x, y), then the marginal probability function for X is

fX(x) = P(X = x) =∑

y

P(X = x and Y = y) =∑

y

f (x, y)

The marginal probability gives the probability of observing a specific value of X(say X = x). To calculate the probability of observing x, we need to add theprobabilites of all events that correspond to X = x: That is,P(X = x) = f (x, y1) + f (x, y2) + . . .+ f (x, yn).

54 / 61

Page 68: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Marginal distributionsExample III.8 (Discrete Marginal Distribution)

Table of Probabilities

X\Y 1 2 3 41 0.1 0 0.1 02 0.3 0 0.1 0.23 0 0.2 0 04 0 0 0 0

What is the marginal probability function forX?

Marginal probability density function (continuous)

If X and Y are two continuous variables for which the joint probability densityfunction is f (x, y), then the marginal probability density function for X is

fX(x) =∫ +∞

−∞f (x, y)dy

55 / 61

Page 69: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Joint distributions and independenceRecall from the last lecture that, if X and Y are two independent events,then P(X and Y ) = P(X)P(Y ). We can generalize this statement:

IndependenceTwo independent continuous (or discrete) random variables areindependent if and only if

f (x, y) = fX(x)fY (y) ⇐⇒ F(x, y) = FX(x)FY (y)where fX(x) and fY (y) are marginal PDF’s. FX(x) and FY (y) denotemarginal CDF’s.

56 / 61

Page 70: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Conditional distributionsRecall from the last lecture that the conditional probability is defined asP(X |Y ) = P(X,Y )

P(Y ) . Furthermore, recall that if X and Y are twoindependent events, then P(X |Y ) = P(X).

Conditional probability density functionSuppose that X and Y are two continuous (or discrete) random variablesfor which the joint PDF is f (x, y) and the marginal PDF’s are fX(x) andfY (y). Suppose also that the value y has already been observed. Theconditional probability density function of X given that Y = y is given by

fX(x|y) = f (x, y)fY (y) .

Note that if X and Y are independent, we get the relationfX(x|y) = fX(x).

57 / 61

Page 71: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Conditional distributions

−1

−0.50

0.5

1

1.5

2

2.5

3 0

0.5

1

1.5

2

2.5

3

3.5

40

0.1

x y

f(x,y)

The black, thick line shows f (x, 1) which is proportional to theconditional distribution of X given Y = 1. More specific:

fX(x|Y = 1) = f (x, 1)fY (1) .

58 / 61

Page 72: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

(Conditional) Expectations and CovarianceExample III.9Let X and Z be two independently distributed standard normal randomvariables and let Y = X2 + Z . Hence, V(X) = V(Z) = 1 andE(X) = E(Z) = 0.a) Derive E[Y |X ].b) Derive E[Y ]. [Hint: Use V[X ] = E[X2]− (E [X ])2.]c) Derive E[XY ]. [Hint: Since X is standard normal, E[X3] = 0.]d) Find Cov(X ,Y ) = E[(X − E(X))(Y − E(Y ))].What is interesting about the results?

59 / 61

Page 73: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Independence and CovarianceCovariance is defined as

Cov(X ,Y ) = E[(X − E(X))(Y − E(Y ))] = E[XY ]− E[X ]E[Y ].

If X and Y are independent, E[XY ] = E[X ]E[Y ]. Therefore, if X andY are independent, Cov(X ,Y ) = 0.

However, Cov(X ,Y ) = 0 (and Corr(X ,Y ) = 0) does not implyindependence as demonstrated in the previous example.

60 / 61

Page 74: Basic Statistics for SGPE Students [.3cm] Part II ...€¦ · Outline 1. Probabilitytheory I Conditionalprobabilitiesandindependence I Bayes’theorem 2. Probabilitydistributions

Summary

I Random variables are either discrete or continuous. Randomvariables are usually denoted by capital letters, e.g. X , andrealisations by small letters, e.g. x.

I For a continuous random variable, P(X = x) = 0 andf (x) 6= P(X = x) where f (x) denotes the probability densityfunction.

I Many probability distributions are closely related. For example, wecan derive the Poisson distribution from the Binomial distributionand the Poisson distribution behaves similar to the Normaldistribution as λ→∞.

I Independence implies Cov(X ,Y ) = 0 (and Corr(X ,Y ) = 0), butnot the other way around. Cov(X ,Y ) and Corr(X ,Y ) measure thestrength of the linear relation between two variables.

61 / 61