ECE531: Principles of Detection and Estimation Course...
Transcript of ECE531: Principles of Detection and Estimation Course...
ECE531: Course Introduction
ECE531: Principles of Detection and EstimationCourse Introduction
D. Richard Brown III
WPI
15-January-2013
WPI D. Richard Brown III 15-January-2013 1 / 39
ECE531: Course Introduction
First Lecture: Major Topics
1. Administrative details: Course web page. Syllabus and textbook. Academic honesty policy. Students with disabilities statement. Piazza. Weekly quizzes. Kalman filter midterm project.
2. Course introduction.
3. Review of essential probability concepts.
WPI D. Richard Brown III 15-January-2013 2 / 39
ECE531: Course Introduction
Why No Homework?
I have found almost no value in collecting and grading homework.
Homework grades are a poor assessment of student understanding. Inmy experience, homework grades are almost entirely uncorrelated tofinal course grades.
Academic honesty problems.
Student focus is often on handing something in rather than reallylearning the material.
You still need to do problems to understand this material. I will provideseveral suggested problems and solutions each week for you to use forpractice. You are encouraged to collaborate!
WPI D. Richard Brown III 15-January-2013 3 / 39
ECE531: Course Introduction
Why Weekly Quizzes?
Traditional midterm/final format leads to poor long-term retention.
Immediate feedback for instructor.
We don’t lose two class periods to exams.
WPI D. Richard Brown III 15-January-2013 4 / 39
ECE531: Course Introduction
Why No Lectures?
It is very difficult to pay attention to a three-hour evening lecture.
Recent trends “flip teaching” and “mobile learning”:
Watch lecture materials outside of the classroom on your ownschedule
Do assigned reading
Work on suggested problems
Collaborate with other students (in person or via Piazza) andreinforce your understanding of the material
Classroom meeting time can be used for more hands-on discussion
Bottom line: I can spend class time interacting with you and helping youunderstand the material, instead of lecturing.
WPI D. Richard Brown III 15-January-2013 5 / 39
ECE531: Course Introduction
Why Piazza?
I will answer questions on Piazza. Those questions/answers will be visibleto everyone.
You can answer questions (or contribute to the discussion) on Piazza too.Helping others understand the material is an excellent way to reinforceyour own understanding.
I can mark questions/answers as “good questions” and “good answers”. Ican also mark duplicate questions. Try not to post duplicate questions.
If you have a question, chances are someone else has that question too. Infact, your question might already be answered on Piazza.
WPI D. Richard Brown III 15-January-2013 6 / 39
ECE531: Course Introduction
Some Notation
A set with discrete elements: S = −1, π, 6. The cardinality of a set: |S| = 3. The set of all integers: Z = . . . ,−1, 0, 1, . . . . The set of all real numbers: R = (−∞,∞). Intervals on the real line: [−3, 1], (0, 1], (−1, 1), [10,∞). Multidimensional sets:
a, b, c2 is shorthand for the set aa, ab, ac, ba, bb, bc, ca, cb, cc. R
2 is the two-dimensional real plane. R
3 is the three-dimensional real volume.
An element of a set: s ∈ S. A subset: W ⊆ S. The probability of an event A: Prob[A] ∈ [0, 1]. The joint probability of events A and B: Prob[A,B] ∈ [0, 1]. The probability of event A conditioned on event B:
Prob[A |B] ∈ [0, 1].
WPI D. Richard Brown III 15-January-2013 7 / 39
ECE531: Course Introduction
Typical Detection Problems
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−3
−2
−1
0
1
2
3
time
y(t)
Is this a sine wave plus noise, or just noise? Is the frequency of the sine wave 1Hz or 2Hz? Detection is about making smart choices from a finite set of
possibilities (and the consequences).
WPI D. Richard Brown III 15-January-2013 8 / 39
ECE531: Course Introduction
Typical Estimation Problems
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−3
−2
−1
0
1
2
3
time
y(t)
What is the frequency, phase, and/or amplitude of the sine wave? What is the mean and/or variance of the noise? Estimation is about “guessing” values from an infinite number of
possibilities (and the consequences).
WPI D. Richard Brown III 15-January-2013 9 / 39
ECE531: Course Introduction
Joint Estimation and Detection
Suppose we have a binary communication system with an intersymbolinterference channel. M symbols are sent through the channel and weobserve
yk =L−1∑
ℓ=0
hℓsk−ℓ + wk
for k ∈ 0, . . . , L+M − 2 where
Unknown binary symbols [s0, . . . , sM−1] ∈ −1,+1M Unknown discrete-time impulse response of channel
[h0, . . . , hL−1] ∈ RL
Unknown noise [w0, . . . , wL+M−2] ∈ RL+M−1
In some scenarios, we may want know the bits that were sent and thechannel coefficients. This is a joint estimation and detection problem.Why?
WPI D. Richard Brown III 15-January-2013 10 / 39
ECE531: Course Introduction
Consequences
To develop optimal decision rules or estimators, we need to quantify theconsequences of incorrect decisions or inaccurate estimates.
Simple Example
It is not known if a coin is fair (F) or double headed (HH). We are givenone observation of the coin flip. Based on this observation, how do youdecide if the coin is F or HH?
Observation Rule 1 Rule 2 Rule 3 Rule 4
H HH F HH F
T HH F F HH
Suppose you have to pay $100 if you are wrong. Which decision rule is“optimum”?
WPI D. Richard Brown III 15-January-2013 11 / 39
ECE531: Course Introduction
Rule 1: Always decide HH
Note that the observation is ignored here.
If the coin is F (fair), the decision was wrong and you must pay $100.
If the coin is HH (double headed), the decision was right and you paynothing.
The maximum cost (between HH or F) for Rule 1 is $100.The average cost for Rule 1 is
C1 = Prob[F] · $100 + Prob[HH] · $0
where Prob[F] and Prob[HH] are the prior probabilities (our belief beforeany observations) on the coin being fair or double headed, respectively.For purposes of illustration, lets assume Prob[F] = Prob[HH] = 0.5 sothat C1 = $50.
WPI D. Richard Brown III 15-January-2013 12 / 39
ECE531: Course Introduction
Rule 2: Always decide F
Again, the observation is being ignored. Same analysis as for Rule 1...
If the coin is F (fair), the decision was right and you pay nothing.
If the coin is HH (double headed), the decision was wrong and youmust pay $100.
The maximum cost for Rule 2 is $100.The average cost for Rule 2 is
C2 = Prob[F] · $0 + Prob[HH] · $100
If Prob[F] = Prob[HH] = 0.5, then C2 = $50.
WPI D. Richard Brown III 15-January-2013 13 / 39
ECE531: Course Introduction
Rule 3: Decide HH if H observed, F if T observed
If the coin is F (fair), there is a 50% chance the observation will be Hand you will decide HH. This will cost you $100. There is also a 50%chance that the observation will be T and you will decide F. In thiscase, you made the correct decision and pay nothing.
CF = Prob[H|F] · $100 + Prob[T|F] · $0 = $50
If the coin is HH (double headed), what is our cost? $0
The maximum cost for Rule 3 is $50.The average cost for Rule 3 is
C3 = Prob[F] · $50 + Prob[HH] · $0
If Prob[F] = Prob[HH] = 0.5, then C3 = $25.
WPI D. Richard Brown III 15-January-2013 14 / 39
ECE531: Course Introduction
Rule 4: Decide F if H observed, HH if T observed
Obviously, this is a bad rule.
If the coin is F (fair), there is a 50% chance the observation will be Tand you will decide HH. This will cost you $100. There is also a 50%chance that the observation will be H and you will decide F. In thiscase, you made the correct decision and pay nothing.
CF = Prob[T|F] · $100 + Prob[H|F] · $0 = $50
If the coin is HH (double headed), what is our cost? $100
The maximum cost for Rule 4 is $100.The average cost for Rule 4 is
C3 = Prob[F] · $50 + Prob[HH] · $100
If Prob[F] = Prob[HH] = 0.5, then C4 = $75.
WPI D. Richard Brown III 15-January-2013 15 / 39
ECE531: Course Introduction
Remarks
The notion of maximum cost is the maximum over the possible“states of nature” (HH and F in our example), but averaged over theprobabilities of the observation.
F
HH
states of nature
T
H
observations
θ p(y|θ) y
In our example, we could always lose $100, irrespective of the decisionrule. But the maximum cost of Rule 3 was $50.
Is Rule 3 optimal?
WPI D. Richard Brown III 15-January-2013 16 / 39
ECE531: Course Introduction
Probability Basics: Events
Let A be a possible (or impossible) outcome of a random experiment. Wecall A an “event” and Prob[A] ∈ [0, 1] is the probability that A happens.
Examples:
A = tomorrow will be sunny in Worcester, Prob[A] = 0.4.
A = a 9 is rolled with two fair 6-sided dice, Prob[A] = 4
36.
A = a 13 is rolled with two fair 6-sided dice, Prob[A] = 0.
A = an odd number is rolled with two fair 6-sided dice,
Prob[A] =2
36+
4
36+
6
36+
4
36+
2
36=
1
2
A = any number but 9 is rolled with two fair 6-sided dice,Prob[A] = 1− 4
36= 32
36
The last result used the fact that Prob[A] + Prob[A] = 1, where A means“not event A” and Prob[A] is the probability that A doesn’t happen.
WPI D. Richard Brown III 15-January-2013 17 / 39
ECE531: Course Introduction
Probability Basics: Random Variables
Definition
A random variable is a mapping from random experiments to real numbers.
Example: Let X be the Dow Jones average at the close on Friday.
We can easily relate events and random variables.
Example: What is the probability that X ≥ 13500? X is the random variable. It can be anything on the interval [0,∞). The event is A = “X is no less than 13500”.
To answer these types of questions, we need to know the probabilisticdistribution of the random variable X. Every random variable has acumulative distribution function (CDF) defined as
FX(x) := Prob[X ≤ x]
for all x ∈ R.WPI D. Richard Brown III 15-January-2013 18 / 39
ECE531: Course Introduction
Probability Basics: Properties of the CDF
FX(x) := Prob[X ≤ x]
The following properties are true for any random variable X: FX(−∞) = 0. FX(∞) = 1. If y > x then FX(y) ≥ FX(x).
Example: Let X be the Dow Jones average at the close on Friday.
1
x
FX(x)
13500
WPI D. Richard Brown III 15-January-2013 19 / 39
ECE531: Course Introduction
Probability Basics: The PDF
The probability density function (PDF) of the random variable X is
pX(x) :=d
dxFX(x)
The following properties are true for any random variable X: pX(x) ≥ 0 for all x.
∫
∞
−∞pX(x) dx = 1.
Prob[a < X ≤ b] =∫ ba pX(x) dx = FX(b)− FX(a).
Example: Let X be the Dow Jones average at the close on Friday.
x
pX(x)
13500
WPI D. Richard Brown III 15-January-2013 20 / 39
ECE531: Course Introduction
Probability Basics: Mean and Variance
Definition
The mean of the random variable X is defined as
E[X] =
∫
∞
−∞
xpX(x) dx.
The mean is also called the expectation.
Definition
The variance of the random variable X is defined as
var[X] =
∫
∞
−∞
(x− E[X])2pX(x) dx.
Remark: The standard deviation of X is equal to std[X] =√
var[X].
WPI D. Richard Brown III 15-January-2013 21 / 39
ECE531: Course Introduction
Probability Basics: Properties of Mean and Variance
Assuming c is a known constant, it is not difficult to show the followingproperties of the mean:
1. E[cX] = cE[X] (by linearity)
2. E[X + c] = E[X] + c (by linearity)
3. E[c] = c
Assuming c is a known constant, it is not difficult to show the followingproperties of the variance:
1. var[cX] = c2var[X]
2. var[X + c] = var[X]
3. var[c] = 0
WPI D. Richard Brown III 15-January-2013 22 / 39
ECE531: Course Introduction
Uniform Random Variables
Uniform distribution: X ∼ U(a, b) for a ≤ b.
xa b
1
b−a
pX(x)
pX(x) =
1
b−a a ≤ x ≤ b
0 otherwise
Suppose X ∼ U(1, 5). Sketch the CDF. What is Prob[X = 3]? What is Prob[X < 2]? What is Prob[X > 1]? What is E[X]? What is var[X]?
WPI D. Richard Brown III 15-January-2013 23 / 39
ECE531: Course Introduction
Discrete Uniform Random Variables
Uniform distribution: X ∼ U(S) where S is a finite set of discrete pointson the real line. Each element in the set is equally likely. Example:
1 2 3 4 5 6
x
pX(x)
Given S = s1, . . . , sn, then Prob[X = s1] = . . .Prob[X = sn] =1
n and
pX(x) =1
n(δ(x − s1) + · · ·+ δ(x − sn))
Suppose X ∼ U(1, 2, 3). Sketch the CDF. What is Prob[X = 3]? What is Prob[X < 2]? What is Prob[X ≤ 2]? What is E[X]? What is var[X]?
WPI D. Richard Brown III 15-January-2013 24 / 39
ECE531: Course Introduction
Gaussian Random Variables
Gaussian distribution: X ∼ N (µ, σ2) for any µ and σ.
pX(x) =1√2πσ
exp
(−(x− µ)2
2σ2
)
Remarks:
1. E[X] = µ.
2. var[X] = σ2.
3. Gaussian random variables are completely specified by their mean andvariance.
4. Lots of things in the real world are Gaussian or approximatelyGaussian distributed, e.g. exam scores, etc. The Central LimitTheorem explains why this is so.
5. Probability calculations for Gaussian random variables often requirethe use of erf and/or erfc functions (or the Q-function).
WPI D. Richard Brown III 15-January-2013 25 / 39
ECE531: Course Introduction
Final Remarks on Scalar Random Variables
1. The PDF and CDF completely describe a random variable. If X and Y have the same PDF, then they have the same CDF, the
same mean, and the same variance.
2. The mean and variance are only partial statistical descriptions of arandom variable.
If X and Y have the same mean and/or variance, they might have thesame PDF/CDF but not necessarily.
One Random Variable
cdf
mean var
WPI D. Richard Brown III 15-January-2013 26 / 39
ECE531: Course Introduction
Joint Events
Suppose you have two events A and B. We can define a new event
C = both A and B occur
and we can write
Prob[C] = Prob[A ∩B] = Prob[A,B]
A
B
A ∩B
Ω
WPI D. Richard Brown III 15-January-2013 27 / 39
ECE531: Course Introduction
Conditional Probability of Events
Suppose you have two events A and B. We can condition on the event Bto write the probability
Prob[A |B] = the probability of event A given event B happened
When Prob[B] > 0, this conditional probability is defined as
Prob[A |B] =Prob[A ∩B]
Prob[B]=
Prob[A,B]
Prob[B]
Three special cases:
A
A
A
B
BB
ΩΩΩ
WPI D. Richard Brown III 15-January-2013 28 / 39
ECE531: Course Introduction
Conditional Probabilities: Our Earlier Example
It is not known if a coin is fair (F) or double headed (HH). We can writethe conditional probabilities of a one-flip observation as
Prob[observe H | coin is F] = 0.5
Prob[observe T | coin is F] = 0.5
Prob[observe H | coin is HH] = 1
Prob[observe T | coin is HH] = 0
Can you compute Prob[coin is HH | observe H]?
We can write
Prob[coin is HH | observe H] =Prob[coin is HH, observe H]
Prob[observe H]
=Prob[observe H | coin is HH]Prob[coin is HH]
Prob[observe H]
We are missing two things: Prob[coin is HH] and Prob[observe H]....WPI D. Richard Brown III 15-January-2013 29 / 39
ECE531: Course Introduction
Conditional Probabilities: Our Earlier Example
The term Prob[coin is HH] is called the prior probability, i.e. it is ourbelief that the coin is unfair before we have any observations. This isassumed to be given in some of the problems that we will be considering,so let’s say for now that Prob[coin is HH] = 0.5.
The term Prob[observe H] is the unconditional probability that weobserve heads. Can we calculate this?
Theorem (Total Probability Theorem)
If the events B1, . . . , Bn are mutually exclusive, i.e. Prob[Bi, Bj ] = 0 for
all i 6= j, and exhaustive, i.e.∑n
i=1Prob[Bi] = 1, then
Prob[A] =n∑
i=1
Prob[A |Bi]P [Bi].
So how can we use this result to compute Prob[observe H]?WPI D. Richard Brown III 15-January-2013 30 / 39
ECE531: Course Introduction
Independence of Events
Two events are independent if and only if their joint probability is equal tothe product of their individual probabilities, i.e
Prob[A,B] = Prob[A]Prob[B]
Lots of events can be assumed to be independent. For example, supposeyou flip a fair coin twice with A =“the first coin flip is heads”, B =“thesecond coin flip is heads”, and C =“both coin flips are heads”.
Are A and B independent?
Are A and C independent?
Note that when events A and B are independent,
Prob[A |B] =Prob[A,B]
Prob[B]=
Prob[A]Prob[B]
Prob[B]= Prob[A].
This should be intuitively satisfying since knowing B happened doesn’tgive you any useful information about A.
WPI D. Richard Brown III 15-January-2013 31 / 39
ECE531: Course Introduction
Joint Random Variables
When we have two random variables, we require a joint distribution. The jointCDF is defined as
FX,Y (x, y) = Prob[X ≤ x ∩ Y ≤ y] = Prob[X ≤ x, Y ≤ y]
and the joint PDF is defined as
pX,Y (x, y) =∂2
∂x∂yFX,Y (x, y)
If you know the joint distribution, you can get the marginal and conditionaldistributions:
FX(x) = FX,Y (x,∞) FY (y) = FX,Y (∞, y)
pX(x) =
∫ ∞
−∞
pX,Y (x, y) dy pY (y) =
∫ ∞
−∞
pX,Y (x, y) dx
pX(x|Y = y) =pX,Y (x, y)
pY (y)pY (y|X = x) =
pX,Y (x, y)
pX(x)
Marginals are not enough to specify the joint distribution, except in special cases.WPI D. Richard Brown III 15-January-2013 32 / 39
ECE531: Course Introduction
Conditional Distributions
If X and Y are both continuous random variables, then the conditional PDF of Ygiven X = x is
pY (y |X = x) = pY (y |x) =pX,Y (x, y)
pX(x).
with pX,Y (x, y) as the usual joint distribution of X and Y .If X and Y are both discrete random variables (both are drawn from finite sets)with Prob[X = x] > 0, then
Prob[Y = y |X = x] =Prob[X = x, Y = y]
Prob[X = x]
If Y is a discrete random variable and X is a continuous random variable, thenthe conditional probability that Y = y given X = x is
Prob[Y = y |X = x] =pX,Y (x, y)
pX(x)
where pX,Y (x, y) := limh→0Prob[x−h<X≤x,Y=y]
his the joint PDF-probability of
the random variable X and the event Y = y.WPI D. Richard Brown III 15-January-2013 33 / 39
ECE531: Course Introduction
Joint, Marginal, and Conditional Probabilities
Suppose you have two random variables X and Y that can take ondiscrete values 1, 2. We can specify the joint distribution in a table, e.g.
X=1 X=2
Y=1 0.1 0.3
Y=2 0.2 0.4
where the value in each element of the table corresponds to the probabilityof the joint event X = a and Y = b, denoted as pX,Y (a, b).
Note the elements in the table sum to one.
What is the most likely event?
What is the marginal distribution pX(x)?
What is the conditional distribution on Y given X = 1?
WPI D. Richard Brown III 15-January-2013 34 / 39
ECE531: Course Introduction
Joint Statistics
Note that the means and variances are defined as usual for X and Y .When we have a joint distribution, we have two new statistical quantities:
Definition
The correlation of the random variables X and Y is defined as
E[XY ] =
∫
∞
−∞
∫
∞
−∞
xypX,Y (x, y) dx dy.
Definition
The covariance of the random variables X and Y is defined as
cov[X,Y ] =
∫
∞
−∞
∫
∞
−∞
(x− E[X])(y − E[Y ])pX,Y (x, y) dx dy.
WPI D. Richard Brown III 15-January-2013 35 / 39
ECE531: Course Introduction
Conditional Statistics
Definition
The conditional mean of a random variable Y given X = x is defined as
E[Y |x] =
∫
∞
−∞
ypY (y |x) dy.
The definition is identical to the regular mean except that we use theconditional PDF.
Definition
The conditional variance of a random variable Y given X = x is definedas
var[Y |x] =
∫
∞
−∞
(y − E[Y |x])2pY (y |x) dx.
WPI D. Richard Brown III 15-January-2013 36 / 39
ECE531: Course Introduction
Independence of Random Variables
Two random variables are independent if and only if their joint distributionis equal to a product of their marginal distributions, i.e.
pX,Y (x, y) = pX(x)pY (y).
If X and Y are independent, the conditional PDFs can be written as
pY (y |x) =pX,Y (x, y)
pX(x)=
pX(x)pY (y)
pX(x)= pY (y)
and
pX(x | y) = pX,Y (x, y)
pY (y)=
pX(x)pY (y)
pY (y)= pX(x).
These results should be intuitively satisfying since knowing X = x (orY = y) doesn’t tell you anything about Y (or X).
WPI D. Richard Brown III 15-January-2013 37 / 39
ECE531: Course Introduction
Jointly Gausian Random Variables
Definition: The random variables X = [X1, . . . ,Xk]⊤ are jointly Gaussian
if their joint density is given as
pX(x) = |2πP |−1/2 exp
(
(x− µX)⊤P−1(x− µX)
2
)
where µX = E[X] and P = E[(X − µX)(X − µX)⊤].Remarks:
1. µX = [E[X1], . . . ,E[Xk]]⊤ is a k-dimensional vector of means
2. P is a k × k matrix of covariances, i.e.,
P =
E[(X1 − µX1)(X1 − µX1
)] . . . E[(X1 − µX1)(Xk − µXk
)]...
. . ....
E[(Xk − µXk)(X1 − µX1
)] . . . E[(Xk − µXk)(Xk − µXk
)]
WPI D. Richard Brown III 15-January-2013 38 / 39
ECE531: Course Introduction
What’s Next?
This is the only “lecture” in ECE531. The remaining class meetings will bestructured as
First half: Discussion/examples related to reading assignment,screencasts, suggested problems.
Second half: 60-minute quiz.
Next week’s meeting will be focused on topics from Kay vI:1-2. Thesechapters provide an introduction to the problem of parameter estimationand develop the concept minimum-variance unbiased (MVU) estimation.
We will be focused on MVU estimation for the first 3-4 weeks of thecourse.
Any questions/discussion during the week should be posted to Piazza.Students are encouraged to collaborate on everything except the quizzes.
WPI D. Richard Brown III 15-January-2013 39 / 39