EC381_probabilitybasics

download EC381_probabilitybasics

of 20

Transcript of EC381_probabilitybasics

  • 8/10/2019 EC381_probabilitybasics

    1/20

    1

    EXAM 1 MATERIAL

    BASIC PROBABILITY

    Probability of an event's complement: Probability of the union of events:

    Conditional Probability:

    which directly implies that:

    Law of Total Probability:

    For event space , with Bayes' Rule:

    Definition of Independence:

    which implies that:

    and

  • 8/10/2019 EC381_probabilitybasics

    2/20

    2

    DISCRETE RANDOM VARIABLES

    Probability Mass Function: Cumulative Distribution Function:

    ***Expected Value:

    NOTES: It's very possible to have an expected value that couldn't actually happen. For example, if a prof only

    gives out 90s and 100s, and there is a 50% likelihood of each and an even number of students in the class, then

    the expected value of the grades is 95 - even though the prof will never actually give a 95.

    Expected value is LINEAR!!!! This is a beautiful thing, allowing us to do things like this:

    ***Variance:

    Standard Deviation:

    Conditional PMF:

    Conditional Expected Value:

    Families of Discrete Random Variables(list is not exhaustive, but includes the "most important"):

    Bernoulli

    - single trial with two possible outcomes (e.g., flipping a coin, answering a yes/no question)

    For

  • 8/10/2019 EC381_probabilitybasics

    3/20

    3

    Binomial - repeated trials of Bernoullis(e.g., flipping several coins in sequence, answering severalyes/no questions in sequence)

    For a positive integer and

    Geometric - number of successes until a (given number of) failure(s), or number of failures until a (given

    number of) success(es)(e.g., running the Boston marathon every year until the year you manage to

    finish; cold-calling people for donations until you have six donations)

    For

    Discrete Uniform - outcomes in a given range all have an equal likelihood of occurring(e.g., rolling a die)For integers a and b such that a

  • 8/10/2019 EC381_probabilitybasics

    4/20

    4

    CONTINUOUS RANDOM VARIABLES

    Probability Density Function :The density function for continuous random variables is different from the mass function for discrete random variables

    in the sense that we are no longer looking for "mass" at a particular integer outcome to indicate the probability of that

    outcome. Instead, we consider the area under the function's curve within a range of outcomes as the indicator of

    probability for that range of outcomes; hence, we integrate between these limits to determine probabilities for

    outcomes of continuous random variables, and the total area under a density function must equal 1.

    Cumulative Distribution Function :Because the cumulative distribution function is equal at every point to the value of the probability density function's

    integral from up to that point, finding the value of the CDF at a point is equivalent to finding the probability that arandom variable's outcome is less than or equal to that point. In other words, its definition has not changed.

    ***Expected Value :As with all transitions from discrete to continuous values, we should expect summations to be replaced by integrals, and

    that is the only change you see here: and, as you might expect:

    ***Variance :Because we define variance in terms of expected values, and because we have amended our definition of expected

    value to utilize the necessary integration, the definition of variance is exactly the same as before...

    Standard Deviation Again, this is the same as before:

  • 8/10/2019 EC381_probabilitybasics

    5/20

    5

    Families of Continuous Random Variables(list is not even close to exhaustive, but includes the "most important"):

    Uniform - pdf is uniformly distributed on (a,b):For constants ,

    Exponential (

    For

    Erlang -

    For

    and a positive integer

    Gaussian

    - NOTE: Gaussian RVs are HUGELY important!!!!! These babies AREN'T going away, so start

    loving them now ...

    For constants ,

  • 8/10/2019 EC381_probabilitybasics

    6/20

    6

    FUNCTIONS OF RANDOM VARIABLES

    If random variable is a linear transformation of random variable, i.e., , then:

    This shouldn't surprise you, because expected value is linear!!

    Again, there is no surprise here! Addition of a constant just shifts the location of the density function - it does NOT

    affect the spread of the pdf, which is what variance measures. Only the scaling affects the variance, and it shouldn't

    surprise you that the effect is quadratic, for variance is essentially a quadratic measure (the second moment minus the

    square of the expected value).

    Are the above relationships true for ANY random variable? YES!!!! This is true for ANY random variable, provided that

    the transformation of that random variable is LINEAR.

    NOTE: Linear transformations of any kind on uniform and Gaussian random variables produce uniform and Gaussian

    random variables, respectively.

    This is not so for exponential and Erlangrandom variables, the distributions for which are constrained to be 0 for values

    less than 0 and are constrained as well to start at 0. Hence, ONLY linear transformations of the form , where , result in a Y that is an exponential or an Erlang random variable, respectively. Shifting the original Xdistribution in any direction results in a random variable that is neither exponential nor Erlang (respectively); likewise,

    scaling the original X distribution by a negative constant will flip the original distribution, again resulting in a random

    variable that is neither exponential nor Erlang (respectively).

  • 8/10/2019 EC381_probabilitybasics

    7/20

    7

    THE STANDARD NORMAL (AND COMPLEMENTARY) CDFs

    Standard Normal CDF:

    NOTE: The standard normal CDF is just the CDF of a Gaussian PDF centered at 0, with variance 1. Because the CDF is

    always the integral of the PDF, this gives us the following definition:

    Notice that the function under integration is exactly the 0 mean, variance 1 Gaussian.

    Transformation of, a Gaussian Random Variable to Standard Normal Random Variable :We need to be able to do this transformation because there is no analytical solution for the integration of a

    Gaussian pdf; thus, we need to be able to linearly transform general Gaussian (non-zero mean and/or non-unity

    variance) into the standard Gaussian. To do this: Probability thatis less than :This is straightforward, given the standard normal CDF. We simply convert our X value (a) to Z, and find the

    value of the CDF at that point, because the standard normal CDF is, by definition, P[Z

  • 8/10/2019 EC381_probabilitybasics

    8/20

    8

    EXAM 2 MATERIAL

    Pairs of Random Variables

    Relationships between Distributions/Mass Functions:

    Discrete: Continuous:

    Marginals

    Marginals

    Probability of Event A

    Probability of Event A

    Joint Conditional (on Event A)

    NOTE: To find marginal , can marginalize the

    expression on the left over the values of y in A!

    Joint Conditional (on Event A)

    NOTE: To find marginal , can marginalize the

    expression on the left over the values of y in A!

    Conditional (on a Random Variable)

    and Conditional (on a Random Variable)

    and

    Bayes' Rule

    Bayes' Rule

    Independence

    Two RVs are independent if and only if ...

    ...which directly implies that for independent RVs:

    and

    Independence

    Two RVs are independent if and only if ...

    ...which directly implies that for independent RVs:

    and

  • 8/10/2019 EC381_probabilitybasics

    9/20

    9

    Covariance/Correlation:

    NOTE: Remember, shifts (addition of scalars) don't affect covariances. Also, scalars can be pulled out, i.e.,

    Correlation coefficient NOTE: and uncorrelated; completely correlated (linear relationship, i.e.

    Uncorrelated vs. Independent RVs:*Independent RVs are uncorrelated, but uncorrelated RVs are NOT necessarily independent, unless they

    are Gaussian RVs!!

    X,Y Uncorrelated X,Y Independent

    Iterated Expectation:

  • 8/10/2019 EC381_probabilitybasics

    10/20

    10

    Jointly Gaussian Random Variables

    If X,Y are jointly Gaussian random variables (not necessarily independent), then:

    Their joint distribution is Gaussian (hence the phrase "jointly Gaussian") Their marginal distributions and are Gaussian X,Y uncorrelated X,Y independent

    where

    and

    Linear combinations of the Gaussians are Gaussian, i.e.:

    is Gaussian, with: and

    Linear transformations of X and Y are invertible transformations, and hence the transformed variables

    are also jointly Gaussian, i.e., and are not only marginally Gaussian, but also are jointly Gaussian!

    The conditional is Gaussian, with: The conditional is Gaussian, with:

  • 8/10/2019 EC381_probabilitybasics

    11/20

    11

    Detection (Binary Hypothesis Testing)

    We have two hypotheses, ("null hypothesis"/"nothing's there") and ("hypothesis"/"something's there"). Weobserve some random variable and want to decide, based on its value, which hypothesis is true.

    Errors:

    Missed Detection: Choose when is true.

    False Alarm: Choose when is true. Probability of Error: This is always equal to the probability of missed detection times the a priori probability of

    the hypothesis plus the probability of false alarm times the a priori probability of the null hypothesis,

    i.e.:

    Expected Value of the Cost of Errors: All we do here is factor in the associated costs to the probability of error

    computation, i.e.:

    Detectors:

    NOTE: All detectors are written for the continuous case. As always, for the discrete case, the expressions are the same,

    but with big "P" substituted for little "".Maximum Likelihood (ML) Detector: Most basic detector; compares likelihood ratio to 1 in order to determine

    which density is larger. If the density in numerator is larger, then the ratio is larger than 1; if the ratio is smaller

    than 1, then the density in the denominator must be larger.

    Maximum A Posteriori (MAP) Detector: Minimizes probability of error by weighting the likelihoods with the a

    priori probabilities. Note that if the a prioris are equal, then the MAP detector simplifies to the ML detector.

    Minimum Cost (Bayes' Risk) Detector: Minimizes expected value of the cost of error by weighting the

    likelihoods by both the a priori probabilities and the costs of missed detection/false alarm. Note that if the costs

    are equal, then the minimum cost detector simplifies to the MAP detector.

  • 8/10/2019 EC381_probabilitybasics

    12/20

    12

    Useful Math Facts:

    Simplification of the likelihood ratio used in detection becomes much easier when logarithms are taken. So, remember:

    *the log of the product is the sum of the logs: *the log of the quotient is the difference of the logs: *

    (We engineers know that the log can be taken to any base, including base e, so no need to specify

    'ln'.)

    *log 1 = 0

  • 8/10/2019 EC381_probabilitybasics

    13/20

    13

    Estimation

    We want to know X, but we can only observe Y; so, we have to make an educated guess at the value of X given that we

    have observed some value of Y.

    Biased/Unbiased:

    NOTE: If bias = 0, then

    , and the estimator is said to be "unbiased".

    ML Estimation:

    Choose the value of X that maximizes the conditional distribution, called the "likelihood function".

    MAP Estimation:

    Choose the value of X that maximizes the conditional distribution, but now we multiply by the prior distribution on X,

    because we no longer assume equal priors (as in the ML case). Minimum Mean Square Error Estimation:

    Choose the value of X that minimizes the mean square error, , over ALL possible relationships (evennonlinear ones) between X and Y.

    Case 1 (Blind Estimate):

    Case 2 (A, some attribute of X, is observed, thus restricting the possibilities for X):

    Case 3 (A dependent random variable's value, Y=y, is observed):

    Properties of MMSE Estimator:

    *Unbiased*Estimate is orthogonal to (uncorrelated with) the estimation error

    *All functions of the data used in the estimate are orthogonal to (uncorrelated with) the estimation error

    Linear Least Squares Error Estimation:

    Easier than MMSE, because we need the conditional distribution for MMSE! If we don't have that distribution, but know

    the key statistics of X and Y, then:

  • 8/10/2019 EC381_probabilitybasics

    14/20

    14

    Choose the value of X that minimizes the mean square error, , over ONLY LINEAR relationshipsbetween X and Y.

    where, clearly,

    Properties of LLSE Estimator:

    *Unbiased

    *Estimate is orthogonal to (uncorrelated with) the estimation error

    *Estimation error is orthogonal to (uncorrelated with) the data used in the estimate

    Mean SquareError of the LLSE Estimator:

    NOTE: This is also called the "variance of the error", or

    which is the variance in this case because

    Let the error in the linear estimate. Then the mean square error of the estimate is:

    BIG NOTE: X,Y Jointly Gaussian:

    --> conditional distribution is Gaussian, and actually, Y and X ARE linearly related. So, in this case,

    !!!!!

  • 8/10/2019 EC381_probabilitybasics

    15/20

    15

    EXAM 3 MATERIAL

    LIMIT THEOREMS

    Sums of Random Variables

    Let

    . Then:

    by the linearity of expected value. which, ifare independent (or just uncorrelated), simplifies to:

    Repeat after me: If the RVs are independent, then the variance of the sum equals the sum of the variances."

    NOTE: PDFs of the sums of RVs are convolutions of the individual PDFs!!!

    What if N is random? (i.e., we don't know how many variables we're adding ...)

    Let , and let thebe i.i.d. (independent, identically distributed). Then:

    Average of Random Variables

    Letbe i.i.d. (or even just uncorrelated) RVs. Then:

    Markov Inequality

    For non-negative RV ,

  • 8/10/2019 EC381_probabilitybasics

    16/20

    16

    Chebyshev Inequality (this gives a tighter bound than the Markov inequality ...)

    For any RV (not necessarily non-negative),

    Laws of Large Numbers

    Weak Law:

    Letbe i.i.d. (although the weak law holds for uncorrelated RVs). Then:

    for any as arbitrarily close to 0 as we like. Stated another way:The WLLN says that as the number of RVs we're summing approaches infinity, the sample meanapproaches

    the true mean with a probability that approaches certainty!!!!

    Strong Law:

    This is very similar to the weak law, except we require that the RVs be i.i.d., and we are essentially changing

    the above statement to say that as the number of samples approaches infinity, then the sample meanactually

    equals the true mean with absolute certainty:

    Central Limit Theorem

    Remember, the CLT is your friend!!! It basically says that, when we sum together a bunch of i.i.d. RVs, we get

    a Gaussian - which means we can apply all of those nice Gaussian properties simply by invoking the CLT.

    Formally, this says that in the limit as n approaches infinity, the CDF of the sum(NOT the sample mean!!!) of

    the RVs approaches a Gaussian CDF. So, we can use:

    for i.i.d.in which we transform to a standard normal Gaussian CDF, then use the phi or Q function to get the

    probability we seek. We can of course express the mean and standard deviation above in terms of the mean

    and standard deviation of

    as follows...

    Remember:

  • 8/10/2019 EC381_probabilitybasics

    17/20

    17

    Confidence Intervals

    Here, the confidence interval is and is called the lower limit, while is called the upper limit. Moreover, iscalled the confidence coefficient. Notice that if is small, then we're more sure that the outcome we've estimated lieswithin the confidence interval.

    By Chebyshev's inequality:

    Theorem: is a Gaussian RV with unknown mean . The relationship between a confidence intervalestimate of , denoted , is given by

    where

  • 8/10/2019 EC381_probabilitybasics

    18/20

    18

    MARKOV CHAINS

    Markov Property

    Basically, it says we only need to know the current state in order to know the probabilities of the next state, so that

    there is only a one step delay dependence: Markov ("Stochastic") Matrices

    Let represent a stochastic matrix.Row numbers are the states we're leaving; column numbers are the states at which we're arriving, and the -thelement is the probability of entering state from state .

    The sum of each row must be 1, because at each time step, some action MUST be taken, whether it's to stay in the

    same state or move to another.

    Steady State Behavior of Homogeneous Markov Chains

    We can iterate the matrix by raising it to the exponent that represents the time of interest to find the stochastic matrix

    at that time step: Under certain conditions, the matrix will converge to the steady state, where each row will contain the same steady

    state probability vector, denoted using the Greek lowercase letter pi:

    is the eigenvector of corresponding to eigenvalue 1. There can be as many steady state probability vectors as thematrix has eigenvalues equal to 1.

    Definitions

    accessible- Stateis accessible from state if there's a directed path from to .communicate- States and communicate if is accessible fromandis accessible from . A group whose memberscommunicate with each other is called a communicating class.irreducible- In the state diagram or matrix, every state communicates with every other, and hence all states are active

    in the steady state, so none of them can be "reduced out".

  • 8/10/2019 EC381_probabilitybasics

    19/20

    19

    transient- State is transient if communicates with state , but doesn't communicate with. You can think oftransient states as follows: Over time, "things" in that state will leak out to other non-transient states, until

    eventually there's virtually nothing left to leak out. Hence,

    recurrent- Nontransient. The end.

    NOTE: If there are multiple communicating classes with recurrent states, then the steady state probability vectordepends on the initial probability vector (where you probably started).

    period- Greatest common divisor of all possible cycles from all states back to themselves.

    aperiodic- Period of Markov Chain is equal to 1

    NOTE: At least one self loop means that the Markov chain must have period 1, and thus must be aperiodic!!!

    NOTE 2: If the period is greater than 1, then the chain will oscillate between steady state vectors, and will depend on

    where you started.

    Finding Steady State Vectors(without eigendecomposition)

    Step 1.

    Step 2. Draw the state transition diagram.

    Step 3. Identify the transient and recurrent states. If there are any transient states, then you already know the

    probability of being in that state after reaching the steady state is 0! Hence, the element in the steady state vector that

    corresponds to that state is equal to 0!!!

    Step 4. Determine a system of equations, where the elements of are the unknowns. (Remember, you need as manyequations as you have unknowns in order to solve a system of equations.)

    a) You get one of these equations for free: because is a probability vector, its elements must sum to 1, so: b) Draw a dashed line between one recurrent state and the rest of the states. Then, the probability of flowing

    into state must equal the probability of flowing out of state That is, write an equation where

  • 8/10/2019 EC381_probabilitybasics

    20/20

    20

    NOTE: Be sure not to draw dashed lines around transient states. Their steady state probabilities are zero, so

    their values won't help you solve systems of equations involving them.

    NOTE 2: Self-loops don't matter, because they're neither flowing into nor out of that state.

    c)Repeat part b for a different recurrent state; do this until you have a system of as many equations as you

    have unknowns.

    d)Solve the system for the unknown elements of the steady state vector.

    e)Check your work - your results should sum to 1!!!