Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09...
Transcript of Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09...
![Page 1: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/1.jpg)
Probability for Machine Learningby Prof. Seungchul Lee iSystems Design Lab http://isystems.unist.ac.kr/ UNIST
Table of Contents
I. 1. Random Variable (= r.v.)I. Expectation = mean
II. 2. Random Vectors (multivariate R.V.)I. 2.1. Joint density probabilityII. 2.2. Marginal density probabilityIII. 2.3. Conditional probability
III. 3. Bayes RuleIV. 4. Linear Transformation of Random Variables
I. 4.1. For single random variableII. 4.2. Sum of two random variables X and YIII. 4.3. Affine transformation of random vectors
1. Random Variable (= r.v.)
(Rough) Definition: Variable with a probability
Probability that x = a
≜ (x = a) = P (x = a) ⟹ {PX
1) P (x = a) ≥ 02) P (x) = 1∑
all
{ continuous r.v. if x is continuousdiscrete r.v. if x is discrete
![Page 2: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/2.jpg)
Example
: die outcome
Question
Expectation = mean
Example
x
P (X = 1) = P (X = 2) = ⋯ = P (X = 6) =16
y = + : sum of two dicex1 x2(y = 5) = ?PY
E[x] = { xP (x)∑x
xP (x)dx∫x
discrete
continuous
Sample mean E[x]
Variance var[x]
= x ⋅ (∵ uniform distribution assumed)∑x
1m
= E [ ] : mean square deviation from mean(x − E[x])2
![Page 3: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/3.jpg)
2.1. Joint density probabilityJoint density probability models probability of co-occurrence of many r.v.
2. Random Vectors (multivariate R.V.)
x = , n random variables
⎡
⎣⎢⎢⎢⎢x1
x2
⋮xn
⎤
⎦⎥⎥⎥⎥
( = , ⋯ , = )P ,⋯,X1 XnX1 x1 Xn xn
![Page 4: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/4.jpg)
2.2. Marginal density probability
For two r.v.
2.3. Conditional probabilityProbability of one event when we know the outcome of the other
( = )PX1X1 x1
⋮( = )PXnXn xn
P (X)
P (Y )
= P (X, Y = y)∑y
= P (X = x, Y )∑x
( = ∣ = ) = : Conditional prob. of given P ∣X1 X2X1 x1 X2 x2
P ( = , = )X1 x1 X2 x2
P ( = )X2 x2x1 x2
![Page 5: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/5.jpg)
Independent random variableswhen one tells nothing about the other
Example
four dice
probability of
P ( = ∣ = )X1 x1 X2 x2
P ( = ∣ = )X2 x2 X1 x1
P ( = , = )X1 x1 X2 x2
= P ( = )X1 x1
⇕= P ( = )X2 x2
⇕= P ( = )P ( = )X1 x1 X2 x2
, , ,ω1 ω2 ω3 ω4
x
y
= + : sum of the first two diceω1 ω2
= + + + : sum of all four diceω1 ω2 ω3 ω4
[ ] = ?x
y
![Page 6: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/6.jpg)
marginal probability
(x) = (x, y)PX ∑y
PXY
![Page 7: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/7.jpg)
conditional probabilitysuppose we measured
y = 19
(x ∣ y = 19) = ?PX∣Y
![Page 8: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/8.jpg)
Pictorial Explanation
![Page 9: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/9.jpg)
Example
Suppose we have three bins, labeled A, B, and C.Two of the bins have only white balls, and one bin has only black balls.
1) We take one ball, what is the probability that it is white? (white = 1)
2) When a white ball has been drawn from bin C, what is the probability of drawing a white ball from bin B?
3) When two balls have been drawn from two different bins, what is the probability of drawing two whiteballs?
3. Bayes Rule
enables us to swap and in conditional probability
P ( = 1) =X123
P ( = 1 ∣ = 1) =X2 X112
P ( = 1, = 1) = P ( = 1 ∣ = 1)P ( = 1) = ⋅ =X1 X2 X2 X1 X112
23
13
A B
P ( , )X2 X1
∴ P ( ∣ )X2 X1
= P ( ∣ )P ( ) = P ( ∣ )P ( )X2 X1 X1 X1 X2 X2
=P ( ∣ )P ( )X1 X2 X2
P ( )X1
![Page 10: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/10.jpg)
Example
Suppose that in a group of people, 40% are male and 60% are female.50% of the males are smokers, 30% of the females are smokers.Find the probability that a smoker is male
Baye's Rule + conditional probability
Example
History
4. Linear Transformation of Random Variables
x = M or Fy = S or N
P (x = M) = 0.4P (x = F) = 0.6P (y = S ∣ x = M) = 0.5P (y = S ∣ x = F) = 0.3
P (x = M ∣ y = S) = ?
P (x = M ∣ y = S) = = ≈ 0.53P (y = S ∣ x = M)P (x = M)
P (y = S)0.200.38
P (y = S) = P (y = S ∣ x = M)P (x = M) + P (y = S ∣ x = F)P (x = F)= 0.5 × 0.4 + 0.3 × 0.6 = 0.38
≈ 0.1216( )910
20
![Page 11: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/11.jpg)
4.1. For single random variable
4.2. Sum of two random variables and
Note: quality control in manufacturing process
X ↦ Y = aXE[aX]
var(aX)
var(X)
= aE[X]= var(X)a2
= E[(X − E[X] ] = E[(X − ] = E[ − 2X + ])2 X)2 X2 X X2
= E[ ] − 2E[X ] + = E[ ] − 2E[X] +X2 X X2
X2 X X2
= E[ ] − E[XX2 ]2
X Y
Z = X + Y (still univariate)
E[X + Y ]var(X + Y )
cov(X, Y )
= E[X] + E[Y ]= E[(X + Y − E[X + Y ] ] = E[((X − ) + (Y − ) ])2 X Y )2
= E[(X − ] + E[(Y − ] + 2E[(X − (Y − )]X)2 Y )2 X Y
= var(X) + var(Y ) + 2cov(X, Y )
= E[(X − )(Y − )] = E[XY − X − Y + ]X Y Y X XY
= E[XY ] − E[X] − E[Y ] − = E[XY ] − E[X]E[Y ]Y X XY
var(X + Y ) = var(X) + var(Y ) + 2cov(X, Y )
![Page 12: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/12.jpg)
Remark
variance - for univariablecovariance - for bivariable
Covariance two r.v.
Covariance matrix for random vectors
Moments: provide rough clues on probability distribution
4.3. Affine transformation of random vectors
IID random variables
Suppose are IID with mean and variance
cov(x, y) = E[(x − )(y − )]μx μy
cov(X) = E[(X − μ)(X − μ ])T = [ ]cov( , )X1 X1
cov( , )X2 X1
cov( , )X1 X2
cov( , )X2 X2
= [ ]var( )X1
cov( , )X2 X1
cov( , )X1 X2
var( )X2
∫ (x)dx or ∑ (x)dxxkPx xkPx
y = Ax + b
1. E[y] = AE[x] + b
2. cov(y) = A cov(x)AT
{ identically distributedindependent
, , ⋯ ,x1 x2 xm μ σ2
Let x = , then E[x] = , cov(x) =⎡⎣⎢⎢x1
⋮xm
⎤⎦⎥⎥
⎡⎣⎢⎢μ
⋮μ
⎤⎦⎥⎥
⎡
⎣⎢⎢⎢⎢⎢σ2
σ2
⋱σ2
⎤
⎦⎥⎥⎥⎥⎥
![Page 13: Probability for Machine Learning - GitHub Pagesi-systems.github.io/HSE545/machine learning all/09 Probability/01_Probability.pdf · Example Suppose we have three bins, labeled A,](https://reader030.fdocuments.in/reader030/viewer/2022040821/5e6ac5d67122424d46219eb2/html5/thumbnails/13.jpg)
Sum of IID random variables ( single r.v.)
Reduce the variance by a factor of Law of large numbers or Central limit theorem
In [1]:
%%javascript $.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')
→
= ⟹ = Ax where A = [ ]Sm
1m∑i=1
m
xi Sm
1m
1 ⋯ 1
E[ ]Sm
var( )Sm
= AE[x] = [ ] = mμ = μ1m
1 ⋯ 1⎡⎣⎢⎢μ
⋮μ
⎤⎦⎥⎥
1m
= A cov(x) = A =AT
⎡
⎣⎢⎢⎢⎢⎢σ2
σ2
⋱σ2
⎤
⎦⎥⎥⎥⎥⎥AT σ2
m
m ⟹
⟶ N (μ, )x ( )σ
m−−√
2