236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density...

21
236607 Visual Recognition Tutorial 1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables Expected Values and Moments Joint and Marginal Probability Means and variances Covariance matrices Univariate normal density Multivariate Normal densities Contents
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    227
  • download

    0

Transcript of 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density...

Page 1: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 1

• Random variables, distributions, and probability density functions

• Discrete Random Variables

• Continuous Random Variables

• Expected Values and Moments

• Joint and Marginal Probability

• Means and variances

• Covariance matrices

• Univariate normal density

• Multivariate Normal densities

Contents

Page 2: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 2

Random variable X is a variable which value is set as a consequence of random events, that is the events, which results is impossible to know in advance. A set of all possible results is called a sampling domain and is

denoted by . Such random variable can be treated as a “indeterministic” function X which relates every possible random event

with some value . We will be dealing with real random variables

Probability distribution function is a function for which

for every x

Random variables, distributions, and probability density functions

( )X :X R

: [0,1]F R

( ) Pr( )F x X x

Page 3: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 3

Let X be a random variable (d.r.v.) that can assume m different values in the countable set

Let pi be the probability that X assumes the value vi:

 

pi must satisfy:

Mass function satisfy

A connection between distribution and the mass function is given by

Discrete Random Variable

1 2 mv v v

Pr , 1, .i ip X v i m

1

0, and 1.m

i ii

p p

,( ) ( ) , ( ) ( ) lim ( )

y x y xy x

F x P y P x F x F y

( ) 0, and ( ) 1.x

P x P x

Page 4: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 4

The domain of continuous random variable (c.r.v.) is uncountable.

The distribution function of c.r.v can be defined as

where the function p(x) is called a probability density function . It is important to mention, that a numerical value of p(x) is not a “probability of x”. In the continuous case p(x)dx is a value which approximately equals to probability Pr[x<X<x+dx]

Continuous Random Variable

( ) ( )x

F x p y dy

Pr[ ] ( ) ( ) ( )x X x dx F x dx F x p x dx

Page 5: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 5

Important features of the probability density function :

Continuous Random Variable

( ) 1

: Pr( ) 0

Pr( ) ( )b

a

p x dx

x R X x

a X b p x dx

Page 6: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 6

The mean or expected value or average of x is defined by

If Y=g(X) we have:

The variance is defined as: where is the standard deviation of x.

Expected Values and Moments

1

[ ] ( ) for d.r.v.m

i ix i

E x xP x v p

2 2 2 2 2( ) [( ) ] ( ) ( ) [( )] ( [ ]) ,x

var X E X x P x E x E x

[ ] ( ) for c.r.v.E x xf x dx

: ( ) 0

[ ] [ ( )] ( ) ( ) for d.r.v Xx P x

E Y E g X g x P x

[ ( )] ( ) ( ) for c.r.v XE g X g x P x dx

Page 7: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 7

Intuitively variance of x indicates distribution of its samples around its expected value (mean). Important property of the mean is its linearity:

At the same time variance is not linear:

• The k-th moment of r.v. X is E[Xk] (the expected value is a first moment). The k -th central moment is

Expected Values and Moments

[ ] [ ] [ ]E aX bY aE X bE Y

2( ) var( )var aX a X

[( ) ] [( [ ]) ]k kk E X E X E X

Page 8: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 8

Let X and Y be 2 random variables with domains

and

For each pair of values we have a joint probability  

joint mass function

The marginal distributions for x and y are defined as

For c.r.v. marginal distributions can be calculated as

Joint and Marginal Probability

1 2 mv v v 1 nw w

( , )i jv wPr{ , }.ij i jp X v Y w

x

( , ) 0, and (x,y) 1y

P x y P

( ) ( , ), and ( ) ( , ) for d.r.v.x yy x

P x P x y P y P x y

( ) ( , )XP x P x y dy

Page 9: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 9

The variables x and y are said to be statistically independent if and only if

The expected value of a function f(x,y) of two random variables x and y is defined as

 The means and variances are:

Means and variances

( , ) ( ) ( )x yP x y P x P y

[ ( , )] ( , ) ( , ); ( , ) ( , )x y

E f x y f x y P x y or f x y P x y dxdy

2 2 2

2 2 2

[ ] ( , )

[ ] ( , )

[ ] [( ) ] ( ) ( , )

[ ] [( ) ] ( ) ( , ).

xx y

yx y

x x xx y

y y yx y

E x xP x y

E y yP x y

V x E x x P x y

V y E y y P x y

Page 10: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 10

The covariance matrix is defined as the square matrix

whose ijth element ij is the covariance of xi and xj:

Covariance matrices

1 1

2 2

[ ]

[ ][ ] ( ).

[ ]d d

E x

E xE x P

E x

x

x x

cov( , ) [( )( )], , 1, , .i j ij i i j jx x E x x i j d

[( )( ) ],tE x μ x μ

Page 11: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 11

From this we have the Cauchy-Schwartz inequality

The correlation coefficient is normalized covariance

It always . If the variables x

and y are uncorrelated. If y=ax+b and a>0, then

If a<0, then

Question.Prove that if X and Y are independent r.v. then

Cauchy-Schwartz inequality

2 2 2xy x y

2 2

2 2 2

2 2 2

var( ) [( ( )) ] [( ( ) ( )) ]

[( ) ] 2 [( )( )] [( ) ]

2 0

x y x y

x x y y

x xy y

X Y E X Y E X Y

E X E X Y E Y

( , ) /( )xy x yx y

1 ( , ) 1x y ( , ) 0x y ( , ) 1x y

( , ) 1.x y ( , ) 0x y

Page 12: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 12

If the variables are statistically independent, the covariances are zero, and the covariance matrix is diagonal.

The covariance matrix is positive semi-definite: if w is any d-dimensional vector, then . This is equivalent to the requirement that none of the eigenvalues of can ever be negative.

Covariance matrices

211 12 1 1 12 1

221 22 2 21 2 2

21 2 1 2

d d

d d

d d dd d d d

0t w w

Page 13: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 13

The normal or Gaussian probability function is very important. In 1-dimension case, it is defined by probability density function

The normal density is described as a "bell-shaped curve", and it is completely determined by .

The probabilities obey

Univariate normal density

21

21( )

2

x

p x e

,

Pr 0.68

Pr 2 0.95

Pr 3 0.997

x

x

x

Page 14: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 14

Suppose that each of the d random variables xi is normally distributed, each with its own mean and variance: If these variables are independent, their joint density has the form

This can be written in a compact matrix form if we observe that for this case the covariance matrix is diagonal, i.e.,

Multivariate Normal densities

2

2

1

1

2

1 1

1

2

/ 2

1

1( ) ( )

2

1

(2 )

i i

i

i

di i

ii

xd d

x ii i i

x

dd

ii

p p x e

e

x

2( ) ~ ( , )i i ip x N

Page 15: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 15

• and hence the inverse of the covariance matrix is easily written as

Covariance matrices

21

22

2

0 0

0 0

0 0 d

21

21 2

2

1/ 0 0

0 1/ 0

0 0 1/ d

Page 16: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 16

and

• Finally, by noting that the determinant of is just the product of the variances, we can write the joint density in the form

• This is the general form of a multivariate normal density function, where the covariance matrix is no longer required to be diagonal.

Covariance matrices

2

1( ) ( )ti i

i

x

x μ x μ

11( ) ( )

21/ 2/ 2

1( )

(2 )

t

dp x e

x μ x μ

Page 17: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 17

The natural measure of the distance from x to the mean is provided by the quantity

which is the square of the Mahalanobis distance from x to .

Covariance matrices

2 1tr x μ x μ

Page 18: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 18

where is a correlation coefficient;

thus

and after doing dot products in we get the expression for bivariate normal density:

Example:Bivariate Normal Density

211 12 1 1 2

221 22 1 2 2

,

Σ

12

1 2

| | 1

21 212 2 2 1

1 2

21 2 2

1

| | (1 ),1

Σ Σ

1( ) ( )T x - μ Σ x - μ

1 2 [ 1, 2][ 1, 2]

22

1 1 1 1 2 2 2 222

1 1 2 21 2

( , )

1 1exp 2

2(1 )2 1

p x x N

x x x x

Page 19: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 19

The level curves of the 2D Gaussian are ellipses; the principal axes are in the direction of the eigenvectors of and the different width correspond to the corresponding eigenvalues.

For uncorrelated r.v. ( =0 ) the axes are parallel to the coordinate axes.

For the extreme case of the ellipses collapse into straight lines (in fact there is only one independent r.v.).

Marginal and conditional densities are unidimensional normal.

Some Geometric Features

1

Page 20: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 20

Some Geometric Features

Page 21: 236607 Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.

236607 Visual Recognition Tutorial 21

Law of large numbers Let X1, X2,…,be a series of i.i.d. (independent and identically distributed) random variables with E[Xi]= .

Then for Sn= X1+…+ Xn

Central Limit Theorem Let X1, X2,…,be a series of i.i.d. r.v. with E[Xi]= and variance var(Xi)=2

. Then for Sn= X1+…+ Xn

Law of Large Numbers and Central Limit Theorem

1lim nn

Sn

(0,1)DnS nN

n