CS145: Probability & Computing - cs.brown.educs.brown.edu/courses/csci1450/lecture07.pdf · Suppose...

33
CS145: Probability & Computing Lecture 7: Expected value, Markov inequality , Variance, Chebyshev inequality Instructor: Shahrzad Haddadan Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis, Introduction to Probability, 2008 Pitman, Probability, 1999

Transcript of CS145: Probability & Computing - cs.brown.educs.brown.edu/courses/csci1450/lecture07.pdf · Suppose...

  • CS145: Probability & ComputingLecture 7: Expected value, Markov inequality ,

    Variance, Chebyshev inequality Instructor: Shahrzad Haddadan

    Brown University Computer Science

    Figure credits: Bertsekas & Tsitsiklis, Introduction to Probability, 2008

    Pitman, Probability, 1999

  • CS145: Lecture 7 OutlineØExpectation/Linearity of expectation (review)ØMarkov’s InequalityØVariance ØChebyshev's Inequality

  • ExpectationØ The expectation or expected value of a discrete random variable is:

    E[X] =X

    x2XxpX(x)

    Ø (Reminder) Linearity of expectation:Consider a non-random (deterministic) function of a random variable:

    Y = g(X) E[Y ] = E[g(X)] =X

    x

    g(x)pX(x)

    g(E[X]) 6= E[g(X)]Linearity of expectation:

    𝐸 𝑔 𝑋 = 𝑔(𝐸[𝑋])𝑔 𝑋 = 𝑎𝑋 + 𝑏

  • Expectation of Multiple VariablesØ The expectation or expected value of a function of two discrete variables:

    E[g(X,Y )] =X

    x2X

    X

    y2Yg(x, y)pXY (x, y)

    Example 1. Sum in a linear function!

    E[aX + bY + c] = aE[X] + bE[Y ] + c

    Linearity of expectation for two random variables:

    𝐸 𝑔 𝑋, 𝑌 = 𝑔(𝐸 𝑋 , 𝐸[𝑌])𝑔 𝑋, 𝑌 linear

    𝑔 𝑋, 𝑌 = 𝑋 + 𝑌 𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸[𝑌]

    Example 2. 𝑔 𝑋, 𝑌 = 𝑎𝑋 + 𝑏𝑌 + 𝑐

    Linearity of expectation holds whether or not X and Y are independent

  • Linearity of Expectation for Multiple VariablesØ The expectation or expected value of a function of n discrete variables:

    Let 𝑋!, 𝑋", 𝑋#, … , 𝑋$ be n random variables, having ranges 𝜒!, 𝜒",…, 𝜒$respectively.

    Let 𝑔 be a function defined on 𝑋!, 𝑋", 𝑋#, … , 𝑋$. We have:

    Linearity of expectation. If g is a linear function

    𝐸 𝑔 𝑋!, 𝑋", 𝑋#, … , 𝑋$ =

    ,%!∈'!

    ,%"∈'"

    … ,%#∈'#

    𝑔 𝑥!, 𝑥", 𝑥#, … , 𝑥$ 𝑃(!,(",($,…,(#(𝑥!, 𝑥", 𝑥#, … , 𝑥$ )

    𝐸 𝑔 𝑋&, 𝑋', 𝑋(, … , 𝑋) = 𝑔 𝐸 𝑋& , 𝐸[𝑋' , … , 𝐸[𝑋) ])We do not need independence!

  • Going back to binomial distribution

    Let Y be a binomial random variable with parameters n and pReminder: range of 𝑃* is 0,1, …𝑛 , E Y = 𝑛𝑝

    Let 𝑋&𝑋'𝑋(, … , 𝑋) be n Bernoulli random variable with parametersp (not necessarily independent)Reminder: range of 𝑃+!+"+#…+$ is an n dimensional hyper cube.

    If we assume 𝑋&𝑋'𝑋(, … , 𝑋) are independent, the probability distribution of Y=∑,-&) 𝑋, is binomial with parameters n and p

  • Going back to binomial distribution

    Let Y be a binomial random variable with parameters n and pReminder: E Y = 𝑛𝑝

    Let 𝑋&𝑋'𝑋(, … , 𝑋) be n Bernoulli random variable with parametersp (not necessarily independent). Let Y=∑,-&) 𝑋,What is E[Y] =?

    First approach. Take 𝑔(𝑥&, 𝑥',. . , 𝑥)) = ∑𝑥,, Calculate 𝐸 𝑔 = ∑𝑔(𝑥&, 𝑥',. . , 𝑥)) 𝑃+!+"+#…+$ (𝑥&, 𝑥',. . , 𝑥))What is 𝑃+!+"+#…+$ ?

  • Going back to binomial distribution

    Let 𝑋&𝑋'𝑋(, … , 𝑋) be n Bernoulli random variable all with parameters p (not necessarily independent)What is E[∑𝑋,] =?

    Second approach. Take 𝑔(𝑥&, 𝑥',. . , 𝑥)) = ∑,-&) 𝑥,, Sum is linear! Use linearity of expectation. 𝐸 𝑔(𝑋&, 𝑋' , 𝑋(, … , 𝑋) ) = 𝑔 𝐸[𝑋& , 𝑋' , … , 𝐸[𝑋)])) = ∑,-&) 𝐸[𝑋,]No need to know 𝑃+!+"+#…+$!

    E[∑𝑋1] =np whether or not 𝑋1 s are independent!

  • Calculating ExpectationØ We know 𝑃+

    v We can calculate the weighted averagev It is a distribution that we know: Bernoulli, Binomial, Geometric

    Ø We don’t know 𝑃+ but we can 𝑌 = 𝑔(𝑋!, 𝑋", … , 𝑋$) and we know 𝑃(% 𝑠and g is a linear function

    We calculate 𝐸[𝑋,]s, and use the linearity of expectation

    Examples. v Coupon collectorv number of successes in non-independent trialsv the hat problem (Example 2.11 of your book)

    𝐸 𝑔 𝑋&, 𝑋', 𝑋(, … , 𝑋) = 𝑔 𝐸 𝑋& , 𝐸[𝑋' , … , 𝐸[𝑋) ])

  • From Expectation to Probability distribution

    𝑋&𝑋'𝑋(, … , 𝑋) 𝑔 𝑋!, 𝑋", 𝑋#, … , 𝑋$Let be n random variables and

    Probability distribution Expectation

    Probability distribution Expectation

    Consider r.v. Y having unknown distribution 𝑃*. If we know 𝐸 𝑌 = 𝜇 can we conclude anything about 𝑃*?

  • Expectation -> Probability Consider r.v. Y having unknown distribution 𝑃*. If we know 𝐸 𝑌 = 𝜇 can we conclude anything about 𝑃*?

    What can we say about 𝑃 𝐸 𝑌 = 𝑌 ?Let Y={+1, -1} with equal probability thus E[Y]=0, 𝑃 𝐸 𝑌 = 𝑌 = 0

    What if we repeat n times? 𝑌&, 𝑌', 𝑌(, … , 𝑌) 𝑡ℎ𝑢𝑠 𝐸 𝑌 = 𝐸 𝑌, .Can we claim 𝑃 𝐸 𝑌 = 𝑌, > 0? No! the same example

    What if instead of looking at 𝑃 𝐸 𝑌 = 𝑌, we look at how close/far E[Y] and Y can be? Basically can we say

    𝑃 𝐸 𝑌 − 𝑌 > 𝑘 gets small as k gets bigger?

  • Expectation -> Probability What if instead of looking at 𝑃 𝐸 𝑌 = 𝑌, we look at how close/far E[Y] and

    Y can be? Basically can we say 𝑃 𝐸 𝑌 − 𝑌 > 𝑘 gets small as k gets bigger?

    Example 1.

    Example 2.

    Section 3.2. Expectation 163

    for the binomial (n, p) distribution, used in Chapter 2. In n independent trials with probability p of success on each trial, you expect to get around J.L = np successes. So it is natural to say that the expected number of successes in n trials is np. This suggests the following definition of the expected value E(X) of a random variable X. For X the number of successes in n trials, this definition makes E(X) = np. See Example 7.

    Definition of Expectation The expectation (also called expected value, or mean) of a random variable X, is the mean of the distribution of X, denoted E(X). That is

    E(X) = LXP(X = x) all x

    the average of all possible values of X, weighted by their probabilities.

    Random sampling. Suppose n tickets numbered Xl,"" Xn are put in a box and a ticket is drawn at random. Let X be the x-value on the ticket drawn. Then E(X) = x, the ordinary average of the list of numbers in the box. This follows from the above definition, and the weighted average formula (1) for x, because the distribution of X is the empirical distribution of x-values in the list:

    P(X = x) = Pn(x) = #{i: 1 :::; i:::; n and Xi = x}/n

    Two possible values. If X takes two possible values, say a and b, with probabilities P(a) and P(b), then

    E(X) = aP(a) + bP(b)

    where P(a) + P(b) = 1. This weighted average of a and b is a number between a and b, proportion P(b) of the way from a to b. The larger P(a), the closer E(X) is to a; and the larger P(b), the closer E(X) is to b.

    a a b b

    center of the mass

    For arbitrary k let Y={-2k,+2k}, thus, E[Y]=0 and 𝑃 𝐸 𝑌 − 𝑌 > 𝑘 =1𝐸 𝑌 − 𝑌

    Bounding Deviation from Expectation

    Theorem

    [Markov Inequality] For any non-negative random variable, andfor all a > 0,

    Pr(X � a) E [X ]a

    .

    Proof.

    E [X ] =X

    iPr(X = i) � aX

    i�aPr(X = i) = aPr(X � a).

    Example: What is the probability of getting more than3N4 heads in

    N coin flips? N/23N/4 23 .

  • Expectation -> Probability What if instead of looking at 𝑃 𝐸 𝑌 = 𝑌, we look at how close/far E[Y] and

    Y can be? Basically can we say 𝑃 𝐸 𝑌 − 𝑌 > 𝑘 gets small as k gets bigger?

    For arbitrary k let Y={-2k,+2k}, thus, E[Y]=0 and 𝑃 𝐸 𝑌 − 𝑌 > 𝑘 =1𝐸 𝑌 − 𝑌

    Bounding Deviation from Expectation

    Theorem

    [Markov Inequality] For any non-negative random variable, andfor all a > 0,

    Pr(X � a) E [X ]a

    .

    Proof.

    E [X ] =X

    iPr(X = i) � aX

    i�aPr(X = i) = aPr(X � a).

    Example: What is the probability of getting more than3N4 heads in

    N coin flips? N/23N/4 23 .

    𝑃(𝑋 > 2𝐸[𝑋]) ≤ 1/2

    𝑃(𝑋 > 𝑘 𝐸[𝑋]) ≤ 1/𝑘

  • Markov inequality Bounding Deviation from ExpectationTheorem

    [Markov Inequality] For any non-negative random variable, andfor all a > 0,

    Pr(X � a) E [X ]a

    .

    Proof.

    E [X ] =X

    iPr(X = i) � aX

    i�aPr(X = i) = aPr(X � a).

    Example: What is the probability of getting more than3N4 heads in

    N coin flips? N/23N/4 23 .

    Sec. 7.1 Some Useful Inequalities 3

    7.1 SOME USEFUL INEQUALITIES

    In this section, we derive some important inequalities. These inequalities use themean, and possibly the variance, of a random variable to draw conclusions onthe probabilities of certain events. They are primarily useful in situations wherethe mean and variance of a random variable X are easily computable, but thedistribution of X is either unavailable or hard to calculate.

    We first present the Markov inequality. Loosely speaking it asserts thatif a nonnegative random variable has a small mean, then the probability that ittakes a large value must also be small.

    Markov Inequality

    If a random variable X can only take nonnegative values, then

    P(X ≥ a) ≤ E[X]a

    , for all a > 0.

    To justify the Markov inequality, let us fix a positive number a and considerthe random variable Ya defined by

    Ya ={

    0, if X < a,a, if X ≥ a.

    It is seen that the relationYa ≤ X

    always holds and therefore,E[Ya] ≤ E[X].

    On the other hand,

    E[Ya] = aP(Ya = a) = aP(X ≥ a),

    from which we obtainaP(X ≥ a) ≤ E[X].

    Example 7.1. Let X be uniformly distributed on the interval [0, 4] and note thatE[X] = 2. Then, the Markov inequality asserts that

    P(X ≥ 2) ≤ 22

    = 1, P(X ≥ 3) ≤ 23

    = 0.67, P(X ≥ 4) ≤ 24

    = 0.5.

    Fix some constant a>0, and define

    aP (X � a) = E[Ya] E[X]

  • Markov’s InequalityBounding Deviation from ExpectationTheorem

    [Markov Inequality] For any non-negative random variable, andfor all a > 0,

    Pr(X � a) E [X ]a

    .

    Proof.

    E [X ] =X

    iPr(X = i) � aX

    i�aPr(X = i) = aPr(X � a).

    Example: What is the probability of getting more than3N4 heads in

    N coin flips? N/23N/4 23 .

    Ø No such inequality would hold if Xcould take negative values. Why?

    Ø If a < E[X], Markov’s inequality is vacuous, but no better bound is possible. Why?

    Sec. 7.1 Some Useful Inequalities 3

    7.1 SOME USEFUL INEQUALITIES

    In this section, we derive some important inequalities. These inequalities use themean, and possibly the variance, of a random variable to draw conclusions onthe probabilities of certain events. They are primarily useful in situations wherethe mean and variance of a random variable X are easily computable, but thedistribution of X is either unavailable or hard to calculate.

    We first present the Markov inequality. Loosely speaking it asserts thatif a nonnegative random variable has a small mean, then the probability that ittakes a large value must also be small.

    Markov Inequality

    If a random variable X can only take nonnegative values, then

    P(X ≥ a) ≤ E[X]a

    , for all a > 0.

    To justify the Markov inequality, let us fix a positive number a and considerthe random variable Ya defined by

    Ya ={

    0, if X < a,a, if X ≥ a.

    It is seen that the relationYa ≤ X

    always holds and therefore,E[Ya] ≤ E[X].

    On the other hand,

    E[Ya] = aP(Ya = a) = aP(X ≥ a),

    from which we obtainaP(X ≥ a) ≤ E[X].

    Example 7.1. Let X be uniformly distributed on the interval [0, 4] and note thatE[X] = 2. Then, the Markov inequality asserts that

    P(X ≥ 2) ≤ 22

    = 1, P(X ≥ 3) ≤ 23

    = 0.67, P(X ≥ 4) ≤ 24

    = 0.5.

    Fix some constant a>0, and define

  • VarianceØ Reminder. The expectation or expected value of a random variable

    Ø The variance is the expected squared deviation of a random variablefrom its mean (the following definitions are equivalent):

    Var[X] = E[(X � E[X])2] =X

    x2X(x� E[X])2pX(x)

    Var[X] = E[X2]� E[X]2 ="X

    x2Xx2pX(x)

    #�

    "X

    x2XxpX(x)

    #2

    Ø The standard deviation is the square root of the variance:�X = Std[X] =

    pVar[X]

    E[X] =X

    x2XxpX(x)

  • Reminder: Bernoulli DistributionØ A Bernoulli or indicator random variable X has one parameter p:

    pX(1) = p, pX(0) = 1� p, X = {0, 1}Ø For an indicator variable, expected values are probabilities:

    E[X] = pØ Variance of Bernoulli distribution:

    Var[X] = Eh(X � p)2

    i= p(1� p)

    Ø Fair coin (p=0.5) has largest varianceØ Coins that always come up heads (p=1.0),

    or always come up tails (p=0.0), have variance 0

  • Sums of Independent Variables

    Let X and Y be two Bernoulli r.v. such that P(X=Y)=1

    Ø If Z=X+Y and random variables X and Y are independent, we haveE[Z] = E[X] + E[Y ] Var[Z] = Var[X] + Var[Y ]

    Ø Interpretation: Adding independent variables increases varianceVar[Z] � Var[X] and Var[Z] � Var[Y ]

    Only for independent X, Y.For any variables X, Y.

    Ø Example when we do not have independence:

    0

    1

    0 1 𝑃+*1- p 0

    0 p

    𝑉𝑎𝑟 𝑍 = 𝐸 𝑍 − 2𝑝 " = (2 − 2𝑝)"𝑝 + 1 − 𝑝 (2𝑝)"= 1 − 𝑝 𝑝 4 1 − 𝑝 + 4𝑝 = 1 − 𝑝 𝑝 = 𝑉𝑎𝑟 𝑋

    = 𝑉𝑎𝑟[𝑌]

  • Sums of Independent Variables

    Ø Identity used in proof: If X and Y are independent random variables,

    E[XY ] = E[X]E[Y ] if pXY (x, y) = pX(x)pY (y)

    This equality does not hold for general, dependent random variables.

    Ø If Z=X+Y and random variables X and Y are independent, we haveE[Z] = E[X] + E[Y ] Var[Z] = Var[X] + Var[Y ]

    Ø Interpretation: Adding independent variables increases varianceVar[Z] � Var[X] and Var[Z] � Var[Y ]

    Only for independent X, Y.For any variables X, Y.

    Ø The standard deviation of a sum of independent variables is then�Z =

    q�2X + �

    2Y �X =

    pVar[X],�Y =

    pVar[Y ],�Z =

    pVar[Z]

  • Linear combination of Independent Variables

    𝑉𝑎𝑟 𝑋 = 𝑋:Ω → 𝒳, ∃𝑎 ∈ Ω; 𝑝+(𝑎) = 1

    𝑉𝑎𝑟 𝑎 =Equivalently

    𝑉𝑎𝑟 𝑋 + 𝑎 =

    𝑉𝑎𝑟 𝑎𝑋 =

    𝑉𝑎𝑟 𝑎𝑋 + 𝑏𝑌 + 𝑐 =

    𝑉𝑎𝑟 𝑋 + 𝑎 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑎 = 𝑉𝑎𝑟[𝑋]

    𝐸 𝑎𝑋 + 𝑏𝑌 + 𝑐 =

  • Binomial Probability DistributionØ Suppose you flip n coins with bias p, count number of headsØ A binomial random variable X has parameters n, p:

    Xi is a Bernoulli variable indicating whether toss i comes up heads,

    pX(k) =

    ✓n

    k

    ◆pk(1� p)n�k X = {0, 1, 2, . . . , n}

    X =Pn

    i=1 Xi

    because tosses are independent:

    Var[X] = np(1� p)E[X] = np

    𝑛 = 20, 𝑝 = 0.5

  • Binomial Probability DistributionØ Suppose you flip n coins with bias p, count number of headsØ A binomial random variable X has parameters n, p:

    Xi is a Bernoulli variable indicating whether toss i comes up heads,

    pX(k) =

    ✓n

    k

    ◆pk(1� p)n�k X = {0, 1, 2, . . . , n}

    X =Pn

    i=1 Xi

    because tosses are independent:

    Var[X] = np(1� p)E[X] = np

    𝑛 = 20, 𝑝 = 0.2

  • Binomial Probability DistributionØ Suppose you flip n coins with bias p, count number of headsØ A binomial random variable X has parameters n, p:

    Xi is a Bernoulli variable indicating whether toss i comes up heads,

    pX(k) =

    ✓n

    k

    ◆pk(1� p)n�k X = {0, 1, 2, . . . , n}

    X =Pn

    i=1 Xi

    because tosses are independent:

    Var[X] = np(1� p)E[X] = np

    𝑛 = 100, 𝑝 = 0.5

  • Variance: What comes next?ØExpectation and VarianceØMarkov’s InequalityØVariance

    Can we say 𝑃 𝐸 𝑌 − 𝑌 > 𝑘 gets small as k gets bigger?

    ØChebyshev's Inequality

  • Chebyshev’s InequalityChebyshev’s InequalityTheorem

    For any random variable X , and any a > 0,

    Pr(|X � E [X ]| � a) Var [X ]a2

    .

    Proof.

    Pr(|X � E [X ]| � a) = Pr((X � E [X ])2 � a2)

    By Markov inequality

    Pr((X � E [X ])2 � a2) E [(X � E [X ])2]

    a2

    =Var [X ]

    a2

    Chebyshev’s Inequality

    Theorem

    For any random variable X , and any a > 0,

    Pr(|X � E [X ]| � a) Var [X ]a2

    .

    Proof.

    Pr(|X � E [X ]| � a) = Pr((X � E [X ])2 � a2)

    By Markov inequality

    Pr((X � E [X ])2 � a2) E [(X � E [X ])2]

    a2

    =Var [X ]

    a2

  • Chebyshev’s InequalityChebyshev’s InequalityTheorem

    For any random variable X , and any a > 0,

    Pr(|X � E [X ]| � a) Var [X ]a2

    .

    Proof.

    Pr(|X � E [X ]| � a) = Pr((X � E [X ])2 � a2)

    By Markov inequality

    Pr((X � E [X ])2 � a2) E [(X � E [X ])2]

    a2

    =Var [X ]

    a2

    Ø Another way of parameterizing Chebyshev’s inequality:µ = E[X], � =

    pVar[X]

    P (|X � µ| � k�) 1k2

    Ø Chebyshev bound is vacuous (above one)for events less than one standard deviationfrom the mean. But this could be likely!

    1

    k2

  • Chebyshev’s Inequality

    Ø Another way of parameterizing Chebyshev’s inequality:µ = E[X], � =

    pVar[X]

    P (|X � µ| � k�) 1k2

    Ø Chebyshev bound is vacuous (above one)for events less than one standard deviationfrom the mean. But this could be likely!

    1

    k2

    192 Chapter 3. Random Variables

    Example 7. Problem.

    FIGURE 2. The probability bounded by Chebychev's inequality.

    E(X)

    Proof. Let 11 = E(X) and a = SD(X). The first step is yet another way of writing the event [IX - 111 ka], namely, [(X -11)2 k2a2]. Now define Y = (X - 11)2, a = k2a 2 , to see

    P[lX - 111 ka] = P(Y a)

    :::; E(Y) by Markov's inequality of Section 3.2, using Y 0, a

    a2 1 k2a2 k2 by definition of Y, a, and a.D

    Comparison of the Chebychev bound with normal probabilities. Chebychev's inequality gives universal inequalities, satisfied by all distributions, no what their shape. For k :::; 1 the inequality is trivial, because then 1/k2 1. Here are the bounds for some values of k 1 compared with corresponding probabilities for the normal distribution with parameters 11 and a.

    Probability Chebychev bound Normal value

    P(lX -111 a) at most 1 0.3173 P(IX - 111 2a) at most 1/22 = 0.25 0.0465 P(IX - 111 3a) at most 1/32 0.11 0.00270 P(IX - 111 4a) at most 1/42 0.06 0.000063

    As the table shows, Chebychev's bound will be very crude for a distribution that is approximately normal. Its importance is that it holds no matter what the shape of the distribution, so it gives some information about two-sided tail probabilities whenever the mean and standard deviation of a distribution can be calculated.

    Bounds for a list of numbers. The average of a list of a million numbers is 10 and the average of the squares of the numbers is 101. Find an upper bound on how many of the entries in the list are 14 or more.

  • Markov vs. Chebyshev’s InequalitiesWe flip a fair coin n times.What is the probability of getting more than 3n/4 heads?

    X = number of heads. E[X] = n/2, V ar[X] = n/4.

    Markov’s Inequality:Pr{X � 3n4 }

    E[X]3n/4

    n/23n/4 =

    23

    Chebyshev’s Inequality:Pr{X � 3n4 } Pr{|X �

    n2 | �

    n4 }

    V ar[X](n/4)2 =

    n/4n2/16 =

    4n

    AAADmniclVLbbtNAEHViLsXcUnjgAR5GxIgildwaAUKKqFpxqRBSkJomUpxG6804XsVeu7vrSpG7/8S38MbfsE5MFVpeGMnS2XNmds6Ox08jJlWr9atStW/cvHV7645z9979Bw9r249OZJIJigOaRIkY+URixDgOFFMRjlKBJPYjHPqLw0IfnqOQLOHHapniJCZzzgJGiTLUdLvyY4gQRCwFAgFhAmjCOLjcBcVilA1wnGFIFDAJKkRIReITn0VMLSEJYI5KMT6HOBFodGIq93iz60KIZCY/OI4X40wuWOq4o54LPIt9FEXhSm+A+3E8mvR4s+PugntCxPrUdRsbld+IWCTnLyUccTzLSNH6vXHl9oWXj7w5noEXCELzPa7zrvY0eNElV1yv88KS3mBNv5LslUxx1u5G08MQ/aUM8b/7FurF6HXZSJubLzZyy9QNL+tH63zH2Hl12tHQ++Oxq3N+2mm231xyBaPdaa3earRWAddBuwR1q4z+tPbTmyU0i5ErGhEpx+1WqiY5EYrRCLXjZRJTQhdkjmMDOTH/fZKvVkvDC8PMIEiE+biCFbtZkZNYymXsm8yYqFBe1QryX9o4U8G7Sc54minkdN0oyCJQCRR7CjMmkKpoaQChghmvQENixqDMNjtmCO2rT74OTjqNtsHfO/X9g3IcW9ZT67m1Y7Wtt9a+9cXqWwOLVp9Ue9VP1c/2M/vAPrK/rlOrlbLmsfVX2Me/AWSEHnU=AAADmniclVLbbtNAEHViLsXcUnjgAR5GxIgildwaAUKKqFpxqRBSkJomUpxG6804XsVeu7vrSpG7/8S38MbfsE5MFVpeGMnS2XNmds6Ox08jJlWr9atStW/cvHV7645z9979Bw9r249OZJIJigOaRIkY+URixDgOFFMRjlKBJPYjHPqLw0IfnqOQLOHHapniJCZzzgJGiTLUdLvyY4gQRCwFAgFhAmjCOLjcBcVilA1wnGFIFDAJKkRIReITn0VMLSEJYI5KMT6HOBFodGIq93iz60KIZCY/OI4X40wuWOq4o54LPIt9FEXhSm+A+3E8mvR4s+PugntCxPrUdRsbld+IWCTnLyUccTzLSNH6vXHl9oWXj7w5noEXCELzPa7zrvY0eNElV1yv88KS3mBNv5LslUxx1u5G08MQ/aUM8b/7FurF6HXZSJubLzZyy9QNL+tH63zH2Hl12tHQ++Oxq3N+2mm231xyBaPdaa3earRWAddBuwR1q4z+tPbTmyU0i5ErGhEpx+1WqiY5EYrRCLXjZRJTQhdkjmMDOTH/fZKvVkvDC8PMIEiE+biCFbtZkZNYymXsm8yYqFBe1QryX9o4U8G7Sc54minkdN0oyCJQCRR7CjMmkKpoaQChghmvQENixqDMNjtmCO2rT74OTjqNtsHfO/X9g3IcW9ZT67m1Y7Wtt9a+9cXqWwOLVp9Ue9VP1c/2M/vAPrK/rlOrlbLmsfVX2Me/AWSEHnU=AAADmniclVLbbtNAEHViLsXcUnjgAR5GxIgildwaAUKKqFpxqRBSkJomUpxG6804XsVeu7vrSpG7/8S38MbfsE5MFVpeGMnS2XNmds6Ox08jJlWr9atStW/cvHV7645z9979Bw9r249OZJIJigOaRIkY+URixDgOFFMRjlKBJPYjHPqLw0IfnqOQLOHHapniJCZzzgJGiTLUdLvyY4gQRCwFAgFhAmjCOLjcBcVilA1wnGFIFDAJKkRIReITn0VMLSEJYI5KMT6HOBFodGIq93iz60KIZCY/OI4X40wuWOq4o54LPIt9FEXhSm+A+3E8mvR4s+PugntCxPrUdRsbld+IWCTnLyUccTzLSNH6vXHl9oWXj7w5noEXCELzPa7zrvY0eNElV1yv88KS3mBNv5LslUxx1u5G08MQ/aUM8b/7FurF6HXZSJubLzZyy9QNL+tH63zH2Hl12tHQ++Oxq3N+2mm231xyBaPdaa3earRWAddBuwR1q4z+tPbTmyU0i5ErGhEpx+1WqiY5EYrRCLXjZRJTQhdkjmMDOTH/fZKvVkvDC8PMIEiE+biCFbtZkZNYymXsm8yYqFBe1QryX9o4U8G7Sc54minkdN0oyCJQCRR7CjMmkKpoaQChghmvQENixqDMNjtmCO2rT74OTjqNtsHfO/X9g3IcW9ZT67m1Y7Wtt9a+9cXqWwOLVp9Ue9VP1c/2M/vAPrK/rlOrlbLmsfVX2Me/AWSEHnU=AAADmniclVLbbtNAEHViLsXcUnjgAR5GxIgildwaAUKKqFpxqRBSkJomUpxG6804XsVeu7vrSpG7/8S38MbfsE5MFVpeGMnS2XNmds6Ox08jJlWr9atStW/cvHV7645z9979Bw9r249OZJIJigOaRIkY+URixDgOFFMRjlKBJPYjHPqLw0IfnqOQLOHHapniJCZzzgJGiTLUdLvyY4gQRCwFAgFhAmjCOLjcBcVilA1wnGFIFDAJKkRIReITn0VMLSEJYI5KMT6HOBFodGIq93iz60KIZCY/OI4X40wuWOq4o54LPIt9FEXhSm+A+3E8mvR4s+PugntCxPrUdRsbld+IWCTnLyUccTzLSNH6vXHl9oWXj7w5noEXCELzPa7zrvY0eNElV1yv88KS3mBNv5LslUxx1u5G08MQ/aUM8b/7FurF6HXZSJubLzZyy9QNL+tH63zH2Hl12tHQ++Oxq3N+2mm231xyBaPdaa3earRWAddBuwR1q4z+tPbTmyU0i5ErGhEpx+1WqiY5EYrRCLXjZRJTQhdkjmMDOTH/fZKvVkvDC8PMIEiE+biCFbtZkZNYymXsm8yYqFBe1QryX9o4U8G7Sc54minkdN0oyCJQCRR7CjMmkKpoaQChghmvQENixqDMNjtmCO2rT74OTjqNtsHfO/X9g3IcW9ZT67m1Y7Wtt9a+9cXqWwOLVp9Ue9VP1c/2M/vAPrK/rlOrlbLmsfVX2Me/AWSEHnU=

  • Chebyshev’s Inequality

    Let 𝑋 be a r.v. with 𝐸 𝑋 = 𝜇 and 𝑉𝑎𝑟 𝑋 = 𝑣.

    Let 𝑋&, 𝑋',…, 𝑋) be 𝑛 copies of 𝑋. (which means they all have the same expectation and variance)

    Let 𝑌 = +!

  • Chebyshev’s Inequality

    Let 𝑋 be a r.v. with 𝐸 𝑋 = 𝜇 and 𝑉𝑎𝑟 𝑋 = 𝑣.

    Let 𝑋&, 𝑋',…, 𝑋) be 𝑛 copies of 𝑋. (which means they all have the same expectation and variance)

    Let 𝑌 = +!

  • Chebyshev’s Inequality

    Let 𝑋 be a r.v. with 𝐸 𝑋 = 𝜇 and 𝑉𝑎𝑟 𝑋 = 𝑣.

    Let 𝑋&, 𝑋',…, 𝑋) be 𝑛 independent copies of 𝑋. (which means they all have the same expectation and variance)

    Let 𝑌 = +!

  • Chebyshev’s Inequality

    Let 𝑋 be a r.v. with 𝐸 𝑋 = 𝜇 and 𝑉𝑎𝑟 𝑋 = 𝑣.

    Let 𝑋&, 𝑋',…, 𝑋) be 𝑛 independent copies of 𝑋. (which means they all have the same expectation and variance)

    Let 𝑌 = +!

  • What are expectation and variance? Probability distribution Expectation

    𝑋&, 𝑋',…, 𝑋), …

    Let 𝑋 be a r.v. with 𝐸 𝑋 = 𝜇

    n → ∞ 𝑃𝑋& + 𝑋' +⋯+ 𝑋)

    𝑛− 𝜇 ≥ 𝑘 → 0

    Let 𝑋 be a r.v. with 𝐸 𝑋 = 𝜇 and 𝑉𝑎𝑟 𝑋 = 𝐸[(𝑋 − 𝐸[𝑋])'] = 𝑣.

    𝑋&, 𝑋',…, 𝑋)𝑃

    𝑋& + 𝑋' +⋯+ 𝑋)𝑛

    − 𝜇 ≥ 𝑘 ≤ ]𝑣 𝑛𝑘'