236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density...

236607 Visual Recognition Tutorial 1

• Random variables, distributions, and probability density functions

• Discrete Random Variables

• Continuous Random Variables

• Expected Values and Moments

• Joint and Marginal Probability

• Means and variances

• Covariance matrices

• Univariate normal density

• Multivariate Normal densities

Contents


Random variable X is a variable which value is set as a consequence of random events, that is the events, which results is impossible to know in advance. A set of all possible results is called a sampling domain and is

denoted by . Such random variable can be treated as a “indeterministic” function X which relates every possible random event

with some value . We will be dealing with real random variables

Probability distribution function is a function for which

for every x

Random variables, distributions, and probability density functions

( )X :X R

: [0,1]F R

( ) Pr( )F x X x


Let X be a random variable (d.r.v.) that can assume m different values in the countable set

Let pi be the probability that X assumes the value vi:

pi must satisfy:

Mass function satisfy

A connection between distribution and the mass function is given by

Discrete Random Variable

1 2 mv v v

Pr , 1, .i ip X v i m

1

0, and 1.m

i ii

p p

,( ) ( ) , ( ) ( ) lim ( )

y x y xy x

F x P y P x F x F y

( ) 0, and ( ) 1.x

P x P x


The domain of continuous random variable (c.r.v.) is uncountable.

The distribution function of c.r.v can be defined as

where the function p(x) is called a probability density function . It is important to mention, that a numerical value of p(x) is not a “probability of x”. In the continuous case p(x)dx is a value which approximately equals to probability Pr[x<X<x+dx]

Continuous Random Variable

( ) ( )x

F x p y dy

Pr[ ] ( ) ( ) ( )x X x dx F x dx F x p x dx


Important features of the probability density function :

Continuous Random Variable

( ) 1

: Pr( ) 0

Pr( ) ( )b

a

p x dx

x R X x

a X b p x dx


The mean or expected value or average of x is defined by

If Y=g(X) we have:

The variance is defined as: where is the standard deviation of x.

Expected Values and Moments

1

[ ] ( ) for d.r.v.m

i ix i

E x xP x v p

2 2 2 2 2( ) [( ) ] ( ) ( ) [( )] ( [ ]) ,x

var X E X x P x E x E x

[ ] ( ) for c.r.v.E x xf x dx

: ( ) 0

[ ] [ ( )] ( ) ( ) for d.r.v Xx P x

E Y E g X g x P x

[ ( )] ( ) ( ) for c.r.v XE g X g x P x dx


Intuitively variance of x indicates distribution of its samples around its expected value (mean). Important property of the mean is its linearity:

At the same time variance is not linear:

• The k-th moment of r.v. X is E[Xk] (the expected value is a first moment). The k -th central moment is

Expected Values and Moments

[ ] [ ] [ ]E aX bY aE X bE Y

2( ) var( )var aX a X

[( ) ] [( [ ]) ]k kk E X E X E X


Let X and Y be 2 random variables with domains

and

For each pair of values we have a joint probability

joint mass function

The marginal distributions for x and y are defined as

For c.r.v. marginal distributions can be calculated as

Joint and Marginal Probability

1 2 mv v v 1 nw w

( , )i jv wPr{ , }.ij i jp X v Y w

x

( , ) 0, and (x,y) 1y

P x y P

( ) ( , ), and ( ) ( , ) for d.r.v.x yy x

P x P x y P y P x y

( ) ( , )XP x P x y dy


The variables x and y are said to be statistically independent if and only if

The expected value of a function f(x,y) of two random variables x and y is defined as

The means and variances are:

Means and variances

( , ) ( ) ( )x yP x y P x P y

[ ( , )] ( , ) ( , ); ( , ) ( , )x y

E f x y f x y P x y or f x y P x y dxdy

2 2 2

2 2 2

[ ] ( , )

[ ] ( , )

[ ] [( ) ] ( ) ( , )

[ ] [( ) ] ( ) ( , ).

xx y

yx y

x x xx y

y y yx y

E x xP x y

E y yP x y

V x E x x P x y

V y E y y P x y


The covariance matrix is defined as the square matrix

whose ijth element ij is the covariance of xi and xj:

Covariance matrices

1 1

2 2

[ ]

[ ][ ] ( ).

[ ]d d

E x

E xE x P

E x

x

x x

cov( , ) [( )( )], , 1, , .i j ij i i j jx x E x x i j d

[( )( ) ],tE x μ x μ


From this we have the Cauchy-Schwartz inequality

The correlation coefficient is normalized covariance

It always . If the variables x

and y are uncorrelated. If y=ax+b and a>0, then

If a<0, then

Question.Prove that if X and Y are independent r.v. then

Cauchy-Schwartz inequality

2 2 2xy x y

2 2

2 2 2

2 2 2

var( ) [( ( )) ] [( ( ) ( )) ]

[( ) ] 2 [( )( )] [( ) ]

2 0

x y x y

x x y y

x xy y

X Y E X Y E X Y

E X E X Y E Y

( , ) /( )xy x yx y

1 ( , ) 1x y ( , ) 0x y ( , ) 1x y

( , ) 1.x y ( , ) 0x y


If the variables are statistically independent, the covariances are zero, and the covariance matrix is diagonal.

The covariance matrix is positive semi-definite: if w is any d-dimensional vector, then . This is equivalent to the requirement that none of the eigenvalues of can ever be negative.

Covariance matrices

211 12 1 1 12 1

221 22 2 21 2 2

21 2 1 2

d d

d d

d d dd d d d

0t w w


The normal or Gaussian probability function is very important. In 1-dimension case, it is defined by probability density function

The normal density is described as a "bell-shaped curve", and it is completely determined by .

The probabilities obey

Univariate normal density

21

21( )

2

x

p x e

,

Pr 0.68

Pr 2 0.95

Pr 3 0.997

x

x

x


Suppose that each of the d random variables xi is normally distributed, each with its own mean and variance: If these variables are independent, their joint density has the form

This can be written in a compact matrix form if we observe that for this case the covariance matrix is diagonal, i.e.,

Multivariate Normal densities

2

2

1

1

2

1 1

1

2

/ 2

1

1( ) ( )

2

1

(2 )

i i

i

i

di i

ii

xd d

x ii i i

x

dd

ii

p p x e

e

x

2( ) ~ ( , )i i ip x N


• and hence the inverse of the covariance matrix is easily written as

Covariance matrices

21

22

2

0 0

0 0

0 0 d

21

21 2

2

1/ 0 0

0 1/ 0

0 0 1/ d


and

• Finally, by noting that the determinant of is just the product of the variances, we can write the joint density in the form

• This is the general form of a multivariate normal density function, where the covariance matrix is no longer required to be diagonal.

Covariance matrices

2

1( ) ( )ti i

i

x

x μ x μ

11( ) ( )

21/ 2/ 2

1( )

(2 )

t

dp x e

x μ x μ


The natural measure of the distance from x to the mean is provided by the quantity

which is the square of the Mahalanobis distance from x to .

Covariance matrices

2 1tr x μ x μ


where is a correlation coefficient;

thus

and after doing dot products in we get the expression for bivariate normal density:

Example:Bivariate Normal Density

211 12 1 1 2

221 22 1 2 2

,

Σ

12

1 2

| | 1

21 212 2 2 1

1 2

21 2 2

1

| | (1 ),1

Σ Σ

1( ) ( )T x - μ Σ x - μ

1 2 [ 1, 2][ 1, 2]

22

1 1 1 1 2 2 2 222

1 1 2 21 2

( , )

1 1exp 2

2(1 )2 1

p x x N

x x x x


The level curves of the 2D Gaussian are ellipses; the principal axes are in the direction of the eigenvectors of and the different width correspond to the corresponding eigenvalues.

For uncorrelated r.v. ( =0 ) the axes are parallel to the coordinate axes.

For the extreme case of the ellipses collapse into straight lines (in fact there is only one independent r.v.).

Marginal and conditional densities are unidimensional normal.

Some Geometric Features

1


Some Geometric Features


Law of large numbers Let X1, X2,…,be a series of i.i.d. (independent and identically distributed) random variables with E[Xi]= .

Then for Sn= X1+…+ Xn

Central Limit Theorem Let X1, X2,…,be a series of i.i.d. r.v. with E[Xi]= and variance var(Xi)=2

. Then for Sn= X1+…+ Xn

Law of Large Numbers and Central Limit Theorem

1lim nn

Sn

(0,1)DnS nN

n

236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density...

Documents

Transcript of 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density...