Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26,...

29
Condional distribuons Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Transcript of Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26,...

Page 1: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Conditional distributions

Will MonroeJuly 26, 2017

with materials byMehran Sahamiand Chris Piech

Page 2: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Independence ofdiscrete random variables

Two random variables are independent if knowing the value of one tells you nothing about the value of the other (for all values!).

X⊥Y iff ∀ x , y :

P(X=x ,Y= y)=P(X=x)P(Y= y)- or -

pX ,Y (x , y)=pX (x) pY ( y)

Page 3: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Independence ofcontinuous random variables

Two random variables are independent if knowing the value of one tells you nothing about the value of the other (for all values!).

X⊥Y iff ∀ x , y :

f X ,Y (x , y)=f X (x) f Y ( y)- or -

F X ,Y (x , y)=FX (x)FY ( y)- or -

f X ,Y (x , y)=g(x)h( y)

Page 4: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Review: Sum ofindependent binomials

X∼Bin (n , p) Y∼Bin (m, p)

n flips m flipsX: number of headsin first n flips

Y: number of headsin next m flips

X+Y∼Bin (n+m, p)

More generally:

X i∼Bin (ni , p) ⇒ ∑i=1

N

X i∼Bin (∑i=1

N

ni , p)all X i independent

Page 5: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Review: Sum ofindependent Poissons

X∼Poi (λ1) Y∼Poi (λ2)

λ₁ chips/cookieX: number of chipsin first cookie

Y: number of chipsin second cookie

X+Y∼Poi(λ1+λ2)

More generally:

X i∼Poi(λi) ⇒ ∑i=1

N

X i∼Poi(∑i=1

N

λ i)

λ₂ chips/cookie

all X i independent

Page 6: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Review: Convolution

A convolution is the distribution of the sum of two independent random variables.

f X+Y (a)=∫−∞

dy f X (a− y) f Y ( y)

Page 7: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Review: Sum of independent normals

More generally:

X∼N (μ1 ,σ12) Y∼N (μ2 ,σ2

2)

X+Y∼N (μ1+μ2 ,σ12+σ2

2)

X i∼N (μi ,σi2) ⇒ ∑

i=1

N

X i∼N (∑i=1

N

μi ,∑i=1

N

σ i2)

all X i independent

Page 8: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Virus infections

M∼Bin (50 ,0.1)≈X∼N (5,4.5)

150 computers in a dorm:

50 Macs (each independently infected with probability 0.1)

100 PCs (each independently infected with probability 0.4)

What is P(≥ 40 machines infected)?

M: # infected Macs

P: # infected PCs P∼Bin (100 ,0.4)≈Y∼N (40,24)

P(M+P≥40)≈P(X+Y≥39.5)W=X+Y∼N (5+40, 4.5+24)=N (45,28.5)

P(W≥39.5)=P (W−45√28.5

≥39.5−45√28.5 )≈1−Φ(−1.03)≈0.8485

Page 9: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Review: Conditional probability

The conditional probability P(E | F) is the probability that E happens, given that F has happened. F is the new sample space.

P(E|F )=P(EF)P(F)

E F

S

EF

Page 10: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Discrete conditional distributions

The value of a random variable, conditioned on the value of some other random variable, has a probability distribution.

pX∣Y (x , y)=P(X=x ,Y= y)

P(Y= y)

=pX ,Y (x , y)

pY ( y)

Page 11: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Conditionals from a joint PMF

P(R=1∣Y=3)=P(R=1,Y=3)

P(Y=3)

12345

Y

0R1 2

=0.190.50≈0.38

=pR∣Y (1∣3)

Page 12: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Conditionals from a joint PMF

pR∣Y (r , y)=P(R=r ,Y= y)

P(Y= y)

12345

Y

0R1 2

12345

Y

0R1 2

Page 13: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

More web server hitsYour web server gets X requests from humans and Y requests from bots in a day, independently.

X ~ Poi(λ₁)Y ~ Poi(λ₂)

so X + Y ~ Poi(λ₁ + λ₂)

P(X=k∣X+Y=n)=P(X=k ,Y=n−k )

P (X+Y=n)=

P (X=k)P (Y=n−k)P(X+Y=n)

(independence)

=e−λ1λ1

k

k !e−λ2λ2

n−k

(n−k)!n !

e−λ1+λ2(λ1+λ2)n

=n!

k !(n−k)!

λ1kλ2

n−k

(λ1+λ2)n

=(nk )(λ1

λ1+λ2 )k

(λ2

λ1+λ2 )n−k

→(X∣X+Y )∼Bin (X+Y ,λ1

λ1+λ2)

=n

Page 14: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Continuous conditional distributions

The value of a random variable, conditioned on the value of some other random variable, has a probability distribution.

f X∣Y (x∣y)=f X ,Y (x , y )

f Y ( y)

Page 15: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Ratios of continuous probabilities

The probability of an exact value for a continuous random variable is 0.

But ratios of these probabilities are still well-defined!

P(X=a)P(X=b)

=f X (a)

f X (b)

Page 16: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Defining the undefined

≈ fX(a) if ε is small

P(X=a)P(X=b)

=P(X≈a)P(X≈b)

=limε→0

P(a−ε≤X≤a+ε)P(b−ε≤X≤b+ε)

=limε→0

∫a−ε

a+ε

dx f X (x)

∫b−ε

b+ε

dx f X (x)

=2ε f X (a)

2ε f X (b)=

f X (a)

f X (b)

≈ fX(b)

→ ←2ε

→ ←f

X(a) f

X(b)

Page 17: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Conditioning on a continuous RV

f X∣Y (x∣y)=P(X=x∣Y= y)

=P (X=x ,Y= y)

P(Y= y)

=f X ,Y (x , y)

f Y ( y)

Page 18: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Mixing discrete and continuous

P(a1≤X≤a2 , b1≤N≤b2)=∫a1

a2

dx∑n=b1

b2

f X , N (x ,n)

pN∣X (n∣x)=f X , N (x ,n)

f X (x)

f X∣N (x∣n)=f X , N (x ,n)

pN (n)

Page 19: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Discrete + Continuous Bayes

f X∣N (x∣n)=pN∣X (n∣x) f X (x)

pN (n)

pN∣X (n∣x)=f X∣N (x∣n) pN (n)

f X (x)

N

X

P(N|X)

Page 20: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Break time!

Page 21: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

The probability of a probability

Page 22: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Beta random variable

An beta random variable models the probability of a trial’s success, given previous trials. The PDF/CDF let you compute probabilities of probabilities!

X∼Beta (a ,b)

f X (x)={C xa−1(1−x)b−1 if 0<x<10 otherwise

Page 23: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Estimating an unknown probabilityYou roll a loaded die N times,get A sixes (and N - A non-sixes).

What’s the probability that the die is loaded such that sixes come up less than 1/4 of the time?

X: probability of getting a sixA: number of sixes in N rolls

f X∣A(x∣a)=P(A=a|X=x) f X (x)

P(A=a)

A | X ~ Bin(N, X)

=1

P(A=a) (Na ) x

a(1−x)n−a

⋅1

X ~ Uni(0, 1)

=C xa(1−x)n−a

(“I know nothing”)

???

Page 24: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Beta: Fact sheet

PDF:

expectation: E [X ]=a

a+bvariance: Var(X )=

ab(a+b)2(a+b+1)

number of successes + 1

X∼Beta (a ,b)

number offailures + 1

probabilityof success

f X (x)={C xa−1(1−x)b−1 if 0<x<10 otherwise

Page 25: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Beta takes many forms

Page 26: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

f X (x)=C x1−1(1−x)1−1 if 0<x<1

=C x0(1−x)0

Conjugate distribution

X∼Beta (1,1)

=C=1 ∫

0

1

dxC=1

⇒ Beta (1,1)=Uni(0,1)

f X∣A(x∣a)=P(A=a|X=x) f X (x)

P(A=a)

X ~ Beta(1, 1)“prior”

“likelihood”X | A ~ Beta(a + 1, N – a + 1)“posterior”

“normalizing constant”

Page 27: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Subjective priors

f X∣A(x∣a)=P(A=a|X=x) f X (x)

P(A=a)

X ~ Beta(1, 1)“prior”

X | A ~ Beta(a + 1, N – a + 1)“posterior”

How did we decide onBeta(1, 1) for the prior?

Beta(1, 1): “we haven’t seen any rolls yet.”Beta(4, 1): “we’ve seen 3 sixes and 0 non-sixes.”Beta(2, 6): “we’ve seen 1 six and 5 non-sixes.”

Beta prior = “imaginary” previous trials

Page 28: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Advanced: Dirichlet distributionBeta is the distribution (“conjugate prior”) for the p in the Bernoulli and binomial.

Dirichlet is the distributionfor the p₁, p₂, … in the multinomial.

X1 , X2 ,…∼Dir (a1,a2,…)

f X1, X2,…(x1, x2,…)=

C x1a1−1 x2

b2−1…

if 0<{x1 , x2 ,…}<1,x1+x2+⋯=1(0 otherwise)

Page 29: Conditional distributions - Stanford University · Conditional distributions Will Monroe July 26, 2017 with materials by Mehran Sahami and Chris Piech

Frequentists vs. Bayesians

image: Eric Kilby

Frequentist

A probability is the (real or theoretical) result of a number of experiments.

All probabilities are based on objective experiences.

Bayesian

A probability is a belief.

All probabilities are based on subjective priors.

(It’s not really a debate anymore—real statisticians / data scientists / machine learning practitioners can and do think both ways!)