B.1 Random Variables
B.2 Probability Distributions
B.3 Joint, Marginal and Conditional Probability
Distributions
B.4 Properties of Probability Distributions
B.5 Some Important Probability Distributions
A random variable is a variable whose value is unknown until it is observed.
A discrete random variable can take only a limited, or countable, number of values.
A continuous random variable can take any value on an interval.
The probability of an event is its “limiting relative frequency,” or the
proportion of time it occurs in the long-run.
The probability density function (pdf) for a discrete random
variable indicates the probability of each possible value occurring.
1 2
( )
( ) ( ) ( ) 1n
f x P X x
f x f x f x
The cumulative distribution function (cdf) is an alternative way to
represent probabilities. The cdf of the random variable X, denoted
F(x), gives the probability that X is less than or equal to a specific
value x
F x P X x
For example, a binomial random variable X is the number of
successes in n independent trials of identical experiments with
probability of success p.
(1 )x n xnP X x f x p p
x
! where ! ( 1 2 2 1
! !
n nn n n n
x x n x
For example, if we know that the MUN basketball team has a chance
of winning of 70% (p=0.7) and we want to know how likely they are
to win at least 2 games in the next 3 weeks
For example, if we know that the MUN basketball team has a chance
of winning of 70% (p=0.7) and we want to know how likely they are
to win at least 2 games in the next 3 weeks
First: for winning only once in three weeks, likelihood is 0.189, see?
Times
n!x!n x! 3!
1!3 1! 3
px1 pn x 0.711 0.72 0.063
For example, if we know that the MUN basketball team has a chance
of winning of 70% (p=0.7) and we want to know how likely they are
to win at least 2 games in the next 3 weeks…
The likelihood of winning exactly 2 games, no more or less:
n!x!n x!
3!2!3 2!
3!2!1!
321211 3
px1 pn x 0.721 0.71 0.147
For example, if we know that the MUN basketball team has a chance
of winning of 70% (p=0.7) and we want to know how likely they are
to win at least 2 games in the next 3 weeks
So 3 times 0.147 = 0.441 is the likelihood of winning exactly 2 games
For example, if we know that the MUN basketball team has a chance
of winning of 70% (p=0.7) and we want to know how likely they are
to win at least 2 games in the next 3 weeks
And 0.343 is the likelihood of winning exactly 3 games
n!x!n x! 3!
3!3 3! 3!3!1 3!
3!1 1
px1 pn x 0.731 0.70 0.343
For example, if we know that the MUN basketball team has a chance of
winning of 70% (p=0.7) and we want to know how likely they are to win at
least 2 games in the next 3 weeks
For winning only once in three weeks: likelihood is 0.189
0.441 is the likelihood of winning exactly 2 games
0.343 is the likelihood of winning exactly 3 games
So 0.784 is how likely they are to win at least 2 games in the next 3 weeks
In STATA di Binomial(3,2,0.7) di Binomial(n,k,p)
For example, if we know that the MUN basketball team has a chance
of winning of 70% (p=0.7) and we want to know how likely they are
to win at least 2 games in the next 3 weeks
So 0.784 is how likely they are to win at least 2 games in the next 3
weeks
In STATA di binomial(3,2,0.7) di
Binomial(n,k,p) is the likelihood of winning 1 or less
So we were looking for 1- binomial(3,2,0.7)
1 high school diploma or less
2 some college
3 four year college degree
4 advanced degree
X
0 if had no money earnings in 2002
1 if had positive money earnings in 2002Y
( ) ( , ) for each value can take
( ) ( , ) for each value can take
Xy
Yx
f x f x y X
f y f x y Y
4
1
, 0,1
1 .19 .06 .04 .02 .31
Yx
Y
f y f x y y
f
( , ) ( , )
( | ) ( | )( )X
P Y y X x f x yf y x P Y y X x
P X x f x
y
0 .04/.18=.22
1 .14/.18=.78
| 3f y X
Two random variables are statistically independent if the conditional probability that Y = y given that X = x, is the same as the unconditional probability that Y = y.
|P Y y X x P Y y
( , )( | ) ( )
( ) YX
f x yf y x f y
f x
( , ) ( ) ( )X Yf x y f x f y
B.4.1 Mean, median and mode
For a discrete random variable the expected value is:
1 1 2 2[ ] n nE X x P X x x P X x x P X x
1 1 2 2
1
[ ] ( ) ( ) ( )
( ) ( )
n n
n
i ii x
E X x f x x f x x f x
x f x xf x
Where f is the discrete PDF of x
For a continuous random variable the expected value is:
The mean has a flaw as a measure of the center of a probability distribution in that it can be pulled by extreme values.
E X xf x dx
For a continuous distribution the median of X is the value m such that
In symmetric distributions, like the familiar “bell-shaped curve” of the normal distribution, the mean and median are equal.
The mode is the value of X at which the pdf is highest.
( ) .5P X m P X m
[ ( )] ( ) ( )x
E g X g x f x
E aX aE X
E g X g x f x axf x a xf x aE X
Where g is any function of x, in particular;
The variance of a discrete or continuous random variable X is the expected value of
2g X X E X
The variance
The variance of a random variable is important in characterizing the
scale of measurement, and the spread of the probability distribution.
Algebraically, letting E(X) = μ,
22 2 2var( ) [ ]X E X E X
[ ( , )] ( , ) ( , )x y
E g X Y g x y f x y
( ) ( )E X Y E X E Y
, , ,
, ,
x y x y x y
x y y x x y
E X Y x y f x y xf x y yf x y
x f x y y f x y xf x yf y
E X E Y
( ) ( ) ( )E aX bY c aE X bE Y c
, ,
if and are independent.
x y x y
x y
E XY E g X Y xyf x y xyf x f y
xf x yf y E X E Y X Y
( , ) ( )( )X Yg X Y X Y
If X and Y are independent random variables then the covariance and
correlation between them are zero. The converse of this relationship
is not true.
cov( , ) XY X Y X YX Y E X Y E XY
cov ,
var( ) var( )XY
X Y
X Y
X Y
Covariance and correlation coefficient
The correlation coefficient is a measure of linear correlation between the variables
Its values range from -1 (perfect negative correlation) and 1 (perfect positive correlation)
cov ,
var( ) var( )XY
X Y
X Y
X Y
Covariance and correlation coefficient
If a and b are constants then:
2 2var var( ) var( ) 2 cov( , )aX bY a X b Y ab X Y
var var( ) var( ) 2cov( , )X Y X Y X Y
var var( ) var( ) 2cov( , )X Y X Y X Y
If a and b are constants then:
varX Y varX varY 2 x y
So:
Why is that? (and of course the same happens for the caseof var(X-Y))
var var( ) var( ) 2cov( , )X Y X Y X Y
If X and Y are independent then:
2 2var var( ) var( )aX bY a X b Y
var var( ) var( )X Y X Y
var var var varX Y Z X Y Z
If X and Y are independent then:
var var var varX Y Z X Y Z
Otherwise this expression would have to include all the doubling of each of the (non-zero) pairwise covariances between variables as summands as well
4
1
1 .1 2 .2 3 .3 4 .4 3 Xx
E X xf x
22
2 2 2 21 3 .1 2 3 .2 3 3 .3 4 3 .4
4 .1 1 .2 0 .3 1 .4
1
X XE X
B.5.1 The Normal Distribution
If X is a normally distributed random variable with mean μ and
variance σ2, it can be symbolized as 2~ , .X N
2
22
1 ( )( ) exp ,
22
xf x x
A standard normal random variable is one that has a normal
probability density function with mean 0 and variance 1.
The cdf for the standardized normal variable Z is
~ (0,1)X
Z N
( ) .z P Z z
A weighted sum of normal random variables has a normal
distribution.
21 1 1
22 2 2
~ ,
~ ,
X N
X N
2 2 2 2 21 1 2 2 1 1 2 2 1 1 2 2 1 2 12~ , 2Y YY a X a X N a a a a a a
A “t” random variable (no upper case) is formed by dividing a
standard normal random variable by the square root of an
independent chi-square random variable, , that has been
divided by its degrees of freedom m.
( )~ m
Zt t
Vm
~ 0,1Z N
2( )~ mV
An F random variable is formed by the ratio of two independent chi-
square random variables that have been divided by their degrees of
freedom.
1 2
1 1( , )
2 2
~ m m
V mF F
V m
Top Related