Statistical Signal Processing - University of Louisville 600...Statistical Signal Processing Aly A....

Post on 15-Mar-2021

19 views 1 download

Transcript of Statistical Signal Processing - University of Louisville 600...Statistical Signal Processing Aly A....

ECE 600-03Statistical Signal Processing

Aly A. FaragUniversity of Louisville

Spring 2010www.cvip.uofl.edu

Lecture # 2 – On Random Variables

Reference – 1) My handwritten notes posted on blackboard and 2) Various resources on the web; too many to list – all are basic stuff from Textbooks.

Basic Concepts in Probability

(from the web)

Statistical Experiment

A statistical experiment E is described in terms of the trilogy: {S, σ, P}

S – sample space containing all elementary outcomes and collections of them.

σ – sigma algebra containing all measurable events

P – probability measure; weight/scale given to every even in σ

Sample Space - Events

• Sample Point

– The outcome of a random experiment

• Sample Space S

– The set of all possible outcomes

– Discrete and Continuous

• Events

– A set of outcomes, thus a subset of S

– Certain, Impossible and Elementary

S

BA

Set Operations

• Union

• Intersection

• Complement

• Properties– Commutation

– Associativity

– Distribution

– De Morgan’s Rule

A B

A B

A B

CA

CA

A B B A

A B C A B C

A B C A B A C

C C CA B A B

S

A B

Axioms and Corollaries

• Axioms

• If then:

• If A1, A2, … are pair wise exclusive, then:

• Corollaries

•A B

P A B P A P B

11

k k

kk

P A P A

0 P A

1P S

1CP A P A

1P A

0P

P A B

P A P B P A B

Computing Probabilities Using Counting Methods

• Sampling With Replacement and Ordering

• Sampling Without Replacement and With Ordering

• Permutations of n Distinct Objects

• Sampling Without Replacement and Ordering

• Sampling With Replacement and Without Ordering

kn

1 ... 1n n n k

!k

!

! !

n n n

k n k k n k

1 1

1

n k n k

k n

Conditional Probability

• Conditional Probability of event A given that event B has occurred

• If B1, B2,…,Bn a partition of S, then

(Law of Total Probability)

A B

CA

S

A B

|P A B

P A BP B

B1

B3

B2

A

1 1| ...

| j j

P A P A B P B

P A B P B

Bayes’ Rule

• If B1, …, Bn a partition of S then

1

|

|

|

j

j

j j

n

k k

k

P A BP B A

P A

P A B P B

P A B P B

likelihood priorposterior

evidence

0 11-p p

1010

1-ε ε 1-εε

input

output

Example (Binary communicationchannel)

Which input is more probable if theoutput is 1? A priori, both inputsymbols are equally likely.

Event Independence

• Events A and B are independentif

• If two events have non-zero probability and are mutually exclusive, then they cannot be independent

P A B P A P B

C

A B

½

½

½

½

½ 1 1

1

1

1 1

P A B P A P B

P B C P B P C

P A C P A P C

P A B C P

P A P B P C

Sequential Experiments

• Sequences of Independent Experiments

– E1, E2, …, Ej experiments

– A1, A2, …, Aj respective events

– Independent if

• Bernoulli Trials

– Test whether an event A occurs (success – failure)

– What is the probability of k successes in n independent repetitions of a Bernoulli trial?

– Transmission over a channel with ε = 10-3 and with 3-bit majority vote

1 2

1 2

...

...

n

n

P A A A

P A P A P A

1

!

! !

n kk

n

np k p p

k

n n

k k n k

Random Variables

(from the web)

Random Variables

• The Notion of a Random Variable

– The outcome is not always a number

– Assign a numerical value to the outcome of the experiment

• Definition

– A function X which assigns a real number X(ζ) to each outcome ζ in the sample space of a random experiment

S

x

Sx

ζ

X(ζ) = x

Cumulative Distribution Function

• Defined as the probability of the event {X≤x}

• Properties

XF x P X x

0 1XF x

lim 1Xx

F x

lim 0Xx

F x

if then X Xa b F a F a

X XP a X b F b F a

1 XP X x F x

x

2

1

Fx(x)

¼

½

¾

10 3

1

Fx(x)

x

Types of Random Variables

• Continuous

– Probability Density Function

• Discrete

– Probability Mass Function

X k kP x P X x

X X k k

k

F x P x u x x

X

X

dF xf x

dx

x

X XF x f t dt

Probability Density Function

• The pdf is computed from

• Properties

• For discrete r.v.

dx

fX(x)

X

X

dF xf x

dx

b

Xa

P a X b f x dx

x

X XF x f t dt

1 Xf t dt

fX(x)

XP x X x dx f x dx

x

X X k k

k

f x P x x x

Conditional Distribution

• The conditional distribution function of X given the event B

• The conditional pdf is

• The distribution function can be written as a weighted sum of conditional distribution functions

where Ai mutually exclusive and exhaustive events

|X

P X x BF x B

P B

|

|X

X

dF x Bf x B

dx

1

| |n

X X i i

i

F x B F x A P A

Expected Value and Variance

• The expected value or mean of X is

• Properties

• The variance of X is

• The standard deviation of X is

• Properties

XE X tf t dt

k X k

k

E X x P x

E c c

E cX cE X

E X c E X c

22Var X E X E X

Std X Var X

0Var c

2Var cX c Var X

Var X c Var X

More on Mean and Variance

• Physical Meaning

– If pmf is a set of point masses, then the expected value μ is the center of mass, and the standard deviation σ is a measure of how far values of x are likely to depart from μ

• Markov’s Inequality

• Chebyshev’s Inequality

• Both provide crude upper bounds for certain r.v.’s but might be useful when little is known for the r.v.

E X

P X aa

2

2P X a

a

2

1P X k

k

Joint Distributions

• Joint Probability Mass Function of X, Y

• Probability of event A

• Marginal PMFs (events involving each rv in isolation)

• Joint CMF of X, Y

• Marginal CMFs

,

,

XY j k j j

j k

p x y P X x Y y

P X x Y y

, ,XY XY j k

j A k A

P X Y A p x y

1

,XY j j XY j k

k

p x P X x p x y

1 1 1 1, ,XYF x y P X x Y y

,X XYF x F x P X x

,Y XYF y F y P Y y

Conditional Probability and Expectation

• The conditional CDF of Y given the event {X=x} is

• The conditional PDF of Y given the event {X=x} is

• The conditional expectation of Y given X=x is

, ' '|

y

XY

Y

X

f x y dyF y x

f x

,|

XY

Y

X

f x yf Y x

f x

||

X Y

Y

X

f x y f yf y x

f x

| |YE Y x yf y x dy

Independence of two Random Variables

• X and Y are independent if {X ≤ x} and {Y ≤ y} are independent for every combination of x, y

• Conditional Probability of independent R.V.s

,XY X YF x y F x F y

,XY X Yf x y f x f y

,XY X Yf x y f x f y

|Y Yf y x f y

|X Xf x y f x

Probability Theory

• Primary references:– Any Probability and Statistics text book (Papoulis)– Appendix A.4 in “Pattern Classification” by Duda et al

The principles of probability theory, describing the behavior of systems with random characteristics, are of fundamental importance to pattern recognition.

Esther LevinDept of Computer Science

CCNY

Example 1 ( wikipedia)•two bowls full of cookies.

•Bowl #1 has 10 chocolate chip cookies and 30 plain cookies,•bowl #2 has 20 of each.

•Fred picks a bowl at random, and then picks a cookie at random. •The cookie turns out to be a plain one.

•How probable is it that Fred picked it out of bowl •what’s the probability that Fred picked bowl #1, given that he has a plain cookie?”

•event A is that Fred picked bowl #1, •event B is that Fred picked a plain cookie. •Pr(A|B) ?

Example1 - cpntinuedTables of occurrences and relative frequenciesIt is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The tables below illustrate the use of this method for the cookies.

Number of cookies in each bowl

by type of cookie

Relative frequency of cookies in each bowl

by type of cookie

The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, or 80 cookies.

Bowl 1 Bowl 2 Totals

Chocolate Chip 10 20 30

Plain 30 20 50

Total 40 40 80

Bowl

#1

Bowl

#2Totals

Chocolate

Chip0.125 0.250 0.375

Plain 0.375 0.250 0.625

Total 0.500 0.500 1.000

Example 2

• 1. Power Plant Operation. – The variables X, Y, Z describe

the state of 3 power plants (X=0 means plant X is idle).

– Denote by A an event that a plant X is idle, and by B an event that 2 out of three plants are working.

– What’s P(A) and P(A|B), the probability that X is idle given that at least 2 out of three are working?

X Y Z P(x,y,z)

0 0 0 0.07

0 0 1 0.04

0 1 0 0.03

0 1 1 0.18

1 0 0 0.16

1 0 1 0.18

1 1 0 0.21

1 1 1 0.13

• P(A) = P(0,0,0) + P(0,0,1) + P(0,1,0) + P(0, 1, 1) = 0.07+0.04 +0.03 +0.18 =0.32

• P(B) = P(0,1,1) +P(1,0,1) + P(1,1,0)+ P(1,1,1)= 0.18+ 0.18+0.21+0.13=0.7

• P(A and B) = P(0,1,1) = 0.18

• P(A|B) = P(A and B)/P(B) = 0.18/0.7 =0.257

2. Cars are assembled in four possible locations. Plant I supplies 20% of the cars; plant II, 24%; plant III, 25%; and plant IV, 31%. There is 1 year warrantee on every car.

The company collected data that shows

P(claim| plant I) = 0.05; P(claim|Plant II)=0.11;

P(claim|plant III) = 0.03; P(claim|Plant IV)=0.18;

Cars are sold at random.

An owned just submitted a claim for her car. What are the posterior probabilities that this car was made in plant I, II, III and IV?

• P(claim) = P(claim|plant I)P(plant I) +

P(claim|plant II)P(plant II) +

P(claim|plant III)P(plant III) +

P(claim|plant IV)P(plant IV) =0.0687

• P(plant1|claim) =

= P(claim|plant I) * P(plant I)/P(claim) = 0.146

• P(plantII|claim) =

= P(claim|plant II) * P(plant II)/P(claim) = 0.384

• P(plantIII|claim) =

= P(claim|plant III) * P(plant III)/P(claim) = 0.109

• P(plantIV|claim) =

= P(claim|plant IV) * P(plant IV)/P(claim) = 0.361

Example 3

3. It is known that 1% of population suffers from a particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

a. What is the probability that a random person has a positive blood test.

b. If a blood test is positive, what’s the probability that the person has the disease?

c. If a blood test is negative, what’s the probability that the person does not have the disease?

• A is the event that a person has a disease. P(A) = 0.01; P(A’) = 0.99.

• B is the event that the test result is positive.

– P(B|A) = 0.97; P(B’|A) = 0.03;

– P(B|A’) = 0.06; P(B’|A’) = 0.94;

• (a) P(B) = P(A) P(B|A) + P(A’)P(B|A’) = 0.01*0.97 +0.99 * 0.06 = 0.0691

• (b) P(A|B)=P(B|A)*P(A)/P(B) = 0.97* 0.01/0.0691 = 0.1403

• (c) P(A’|B’) = P(B’|A’)P(A’)/P(B’)= P(B’|A’)P(A’)/(1-P(B))= 0.94*0.99/(1-.0691)=0.9997

Sums of Random Variables

• z = x + y

• Var(z) = Var(x) + Var(y) + 2Cov(x,y)

Special Case: x and y are independent r.v.

• If x,y independent: Var(z) = Var(x) + Var(y)

• Distribution of z:

yxz

dxxzpxpypxpzp yxyx

)()()()()(

Examples:

• x and y are uniform on [0,1]

– Find p(z=x+y), E(z), Var(z);

• x is uniform on [-1,1], and P(y)= 0.5 for y =0, y=10; and 0 elsewhere.

– Find p(z=x+y), E(z), Var(z);

Normal Distributions

• Gaussian distribution

• Mean

• Variance

• Central Limit Theorem says sums of random variables tend toward a Normal distribution.

• Mahalanobis Distance:

xxE )(

22/2)(

2

1),()( xxx

x

eNxp xx

22])[(xx

xE

x

xxr

Multivariate Normal Density

• x is a vector of d Gaussian variables

• Mahalanobis Distance

• All conditionals and marginals are also Gaussian

dxxpxxxxE

dxxxpxE

xTxe

dNxp

TT )())((]))([(

)(][

)(1)(2

1

2/1||2/2

1),()(

)()( 12 xxr T

Bivariate Normal Densities

• Level curves - elliplses.

– x and y width are determined by the variances, and the eccentricity by correlation coefficient

– Principal axes are the eigenvectors, and the width in these direction is the root of the corresponding eigenvalue.

Linear algebra

• Matrix A:

• Matrix Transpose

• Vector a

mnmm

n

n

nmij

aaa

aaa

aaa

aA

...

............

...

...

][

21

22221

11211

mjniabAbB jiij

T

mnij 1,1;][

],...,[;... 1

1

n

T

n

aaa

a

a

a

Matrix and vector multiplication

• Matrix multiplication

• Outer vector product

• Vector-matrix product

)()(,][

;][;][

BcolArowcwherecCAB

bBaA

jiijnmij

npijpmij

matrixnmanABbac

bBbaAa nij

T

mij

,

;][;][ 11

mlengthofvectormatrixmanAbC

bBbaA nijnmij

1

;][;][ 1

Inner Product• Inner (dot) product:

• Length (Eucledian norm) of a vector

• a is normalized iff ||a|| = 1

• The angle between two n-dimesional vectors

• An inner product is a measure of collinearity:– a and b are orthogonal iff

– a and b are collinear iff

• A set of vectors is linearly independent if no vector is a linear combination of other vectors.

n

i

ii

T baba1

n

i

i

T aaaa1

2

||||||||cos

ba

baT

0baT

|||||||| babaT

Determinant and Trace

• Determinant

• det(AB)= det(A)det(B)

• Trace

)det()1(

;,....1;)det(

;][

1

ij

ji

ij

n

j

ijij

nnij

MA

niAaA

aA

n

j

jjnnij aAtraA1

][;][

Matrix Inversion

• A (n x n) is nonsingular if there exists B

• A=[2 3; 2 2], B=[-1 3/2; 1 -1]

• A is nonsingular iff

• Pseudo-inverse for a non square matrix, provided

is not singular

1; ABIBAAB n

0|||| A

TT AAAA 1# ][ AAT

IAA #

Eigenvectors and Eigenvalues

1||||;,...,1, jjjj enjeAe

0]det[ nIA

n

j

jAtr1

][

Characteristic equation:n-th order polynomial, with n roots.

n

j

jA1

]det[