Image classification using multimodal generative-discriminative methods
Classification: Probabilistic Generative Model
Transcript of Classification: Probabilistic Generative Model
Classification: Probabilistic Generative Model
Classification
โข Credit Scoring
โข Input: income, savings, profession, age, past financial history โฆโฆ
โข Output: accept or refuse
โข Medical Diagnosis
โข Input: current symptoms, age, gender, past medical history โฆโฆ
โข Output: which kind of diseases
โข Handwritten character recognition
โข Face recognition
โข Input: image of a face, output: person
้Input:output:
Function๐ฅ Class n
Example Application
๐ = ๐ = ๐ =
Example Application
โข HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
โข Attack: the base modifier for normal attacks (eg. Scratch, Punch)
โข Defense: the base damage resistance against normal attacks
โข SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
โข SP Def: the base damage resistance against special attacks
โข Speed: determines which pokemon attacks first each round
Can we predict the โtypeโ of pokemon based on the information?
35
55
40
50
50
90
pokemon games (NOT pokemon cards or Pokemon Go)
Example Application
Ideal Alternatives
โข Function (Model):
โข Loss function:
โข Find the best function:
โข Example: Perceptron, SVM Not Today
๐ฅOutput = class 1
๐ ๐ฅ
๐ ๐ฅ > 0
๐๐๐ ๐
๐ฟ ๐ =
๐
๐ฟ ๐ ๐ฅ๐ โ เท๐ฆ๐
Output = class 2
The number of times f get incorrect results on training data.
How to do Classification
โข Training data for Classification
๐ฅ1, เท๐ฆ1 ๐ฅ2, เท๐ฆ2 ๐ฅ๐, เท๐ฆ๐โฆโฆ
Training: Class 1 means the target is 1; Class 2 means the target is -1
Classification as Regression?Binary classification as example
Testing: closer to 1 โ class 1; closer to -1 โ class 2
โข Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 โฆโฆ problematic
Penalize to the examples that are โtoo correctโ โฆ (Bishop, P186)
Class 1
x1x1
x2x2
Class 2
Class 1
Class 2
1 1
-1 -1
y = b + w1x1 + w2x2
b + w1x1 + w2x2 = 0
>>1
error
to decrease error
Two Boxes
Box 1 Box 2
P(B1) = 2/3 P(B2) = 1/3
P(Blue|B1) = 4/5
P(Green|B1) = 1/5
P(Blue|B2) = 2/5
P(Green|B2) = 3/5
Where does it come from?
P(B1 | Blue) =๐ Blue|๐ต1 ๐ ๐ต1
๐ Blue|๐ต1 ๐ ๐ต1 + ๐ Blue|๐ต2 ๐ ๐ต2
from one of the boxes
Two Classes
Given an x, which class does it belong to
=๐ ๐ฅ|๐ถ1 ๐ ๐ถ1
๐ ๐ฅ|๐ถ1 ๐ ๐ถ1 + ๐ ๐ฅ|๐ถ2 ๐ ๐ถ2๐ ๐ถ1|๐ฅ
๐ ๐ฅ = ๐ ๐ฅ|๐ถ1 ๐ ๐ถ1 + ๐ ๐ฅ|๐ถ2 ๐ ๐ถ2Generative Model
P(C1) P(C2)
P(x|C1) P(x|C2)
Estimating the ProbabilitiesFrom training data
Class 1 Class 2
Prior
Class 1 Class 2
P(C1) P(C2)
Water Normal
Water and Normal type with ID < 400 for training, rest for testing
Training: 79 Water, 61 Normal
P(C1) = 79 / (79 + 61) =0.56
P(C2) = 61 / (79 + 61) =0.44
Probability from Class
P( |Water)
79 in total
WaterType
= ?
Each Pokรฉmon is represented as a vector by its attribute. feature
P(x|C1) = ?
Probability from Class - Feature
โข Considering Defense and SP Defense
6564
4850
10345
P(x|Water)=? 0?
WaterType
๐ฅ1, ๐ฅ2, ๐ฅ3, โฆโฆ ,๐ฅ79
Assume the points are sampled from a Gaussian distribution.
๐๐,ฮฃ ๐ฅ =1
2๐ ๐ท/2
1
ฮฃ 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐ ๐ฮฃโ1 ๐ฅ โ ๐
Input: vector x, output: probability of sampling x
The shape of the function determined by mean ๐ and covariance matrix ๐ฎ
Gaussian Distribution
๐ฅ1
https://blog.slinuxer.com/tag/pca
๐ฅ2๐ฅ1
๐ฅ2
๐ =00 ๐ =
23
=2 00 2
=2 00 2
๐๐,ฮฃ ๐ฅ =1
2๐ ๐ท/2
1
ฮฃ 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐ ๐ฮฃโ1 ๐ฅ โ ๐
Input: vector x, output: probability of sampling x
The shape of the function determines by mean ๐ and covariance matrix ๐ฎ
Gaussian Distribution
๐ฅ1
https://blog.slinuxer.com/tag/pca
๐ฅ2๐ฅ1
๐ฅ2
๐ =00 ๐ =
00
=2 00 6
=2 โ1โ1 2
Probability from ClassAssume the points are sampled from a Gaussian distribution
Find the Gaussian distribution behind them
WaterType
Probability for new points
๐ฅ1, ๐ฅ2, ๐ฅ3, โฆโฆ ,๐ฅ79
๐ =75.071.3
ฮฃ =874 327327 929
๐๐,ฮฃ ๐ฅ =1
2๐ ๐ท/2
1
ฮฃ 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐ ๐ฮฃโ1 ๐ฅ โ ๐
New x
How to find them?
The Gaussian with any mean ๐ and covariance matrix ๐ดcan generate these points.
Maximum Likelihood
๐ =7575
๐ =150140
=5 00 5
=2 โ1โ1 2
๐๐,ฮฃ ๐ฅ =1
2๐ ๐ท/2
1
ฮฃ 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐ ๐ฮฃโ1 ๐ฅ โ ๐
= ๐๐,ฮฃ ๐ฅ1 ๐๐,ฮฃ ๐ฅ2 ๐๐,ฮฃ ๐ฅ3 โฆโฆ๐๐,ฮฃ ๐ฅ79
Likelihood of a Gaussian with mean ๐ and covariance matrix ๐ด
Different Likelihood
๐ฅ1, ๐ฅ2, ๐ฅ3, โฆโฆ ,๐ฅ79
= the probability of the Gaussian samples ๐ฅ1, ๐ฅ2, ๐ฅ3, โฆโฆ ,๐ฅ79
๐ฟ ๐, ฮฃ
Maximum Likelihood
๐๐,ฮฃ ๐ฅ =1
2๐ ๐ท/2
1
ฮฃ 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐ ๐ฮฃโ1 ๐ฅ โ ๐
๐ฅ1, ๐ฅ2, ๐ฅ3, โฆโฆ ,๐ฅ79We have the โWaterโ type Pokรฉmons:
We assume ๐ฅ1, ๐ฅ2, ๐ฅ3, โฆโฆ ,๐ฅ79 generate from the Gaussian (๐โ, ฮฃโ) with the maximum likelihood
= ๐๐,ฮฃ ๐ฅ1 ๐๐,ฮฃ ๐ฅ2 ๐๐,ฮฃ ๐ฅ3 โฆโฆ๐๐,ฮฃ ๐ฅ79๐ฟ ๐, ฮฃ
๐โ, ฮฃโ = ๐๐๐max๐,ฮฃ
๐ฟ ๐, ฮฃ
๐โ =1
79
๐=1
79
๐ฅ๐ ฮฃโ =1
79
๐=1
79
๐ฅ๐ โ ๐โ ๐ฅ๐ โ ๐โ ๐
average
Maximum Likelihood
Class 1: Water Class 2: Normal
๐1 =75.071.3
๐2 =55.659.8
ฮฃ1 =874 327327 929
ฮฃ2 =847 422422 685
Now we can do classification
=๐ ๐ฅ|๐ถ1 ๐ ๐ถ1
๐ ๐ฅ|๐ถ1 ๐ ๐ถ1 + ๐ ๐ฅ|๐ถ2 ๐ ๐ถ2๐ ๐ถ1|๐ฅ
If ๐ ๐ถ1|๐ฅ > 0.5 x belongs to class 1 (Water)
P(C1) = 79 / (79 + 61) =0.56
P(C2) = 61 / (79 + 61) =0.44
๐๐1,ฮฃ1 ๐ฅ =1
2๐ ๐ท/2
1
ฮฃ1 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1
๐1 =75.071.3
๐2 =55.659.8
ฮฃ1 =874 327327 929
ฮฃ2 =847 422422 685
๐๐2,ฮฃ2 ๐ฅ =1
2๐ ๐ท/2
1
ฮฃ2 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ ๐2
Howโs the results?Testing data: 47% accuracy All: hp, att, sp att, de, sp de, speed (6 features)
64% accuracy โฆ
๐ ๐ถ1|๐ฅ
Blue points: C1 (Water), Red points: C2 (Normal)
๐ ๐ถ1|๐ฅ> 0.5
๐ ๐ถ1|๐ฅ< 0.5
๐ ๐ถ1|๐ฅ> 0.5
๐ ๐ถ1|๐ฅ< 0.5
๐1, ๐2: 6-dim vector
ฮฃ1, ฮฃ2: 6 x 6 matrices
Modifying Model
Class 1: Water
๐1 =75.071.3
๐2 =55.659.8
ฮฃ1 =874 327327 929
ฮฃ2 =847 422422 685
Class 2: Normal
The same ฮฃLess parameters
Modifying Model
โข Maximum likelihood
๐ฅ1, ๐ฅ2, ๐ฅ3, โฆโฆ ,๐ฅ79โWaterโ type Pokรฉmons:
๐ฅ80, ๐ฅ81, ๐ฅ82, โฆโฆ ,๐ฅ140โNormalโ type Pokรฉmons:
๐1 ฮฃ ๐2
Find ๐1, ๐2, ฮฃ maximizing the likelihood ๐ฟ ๐1,๐2,ฮฃ
๐ฟ ๐1,๐2,ฮฃ = ๐๐1,ฮฃ ๐ฅ1 ๐๐1,ฮฃ ๐ฅ2 โฏ๐๐1,ฮฃ ๐ฅ79
ร ๐๐2,ฮฃ ๐ฅ80 ๐๐2,ฮฃ ๐ฅ81 โฏ๐๐2,ฮฃ ๐ฅ140
ฮฃ =79
140ฮฃ1 +
61
140ฮฃ2
Ref: Bishop, chapter 4.2.2
Calculate ๐1 and ๐2 as before
Modifying Model
The same covariance matrix
All: hp, att, sp att, de, sp de, speed
54% accuracy 73% accuracy
The boundary is linear
โข Function Set (Model):
โข Goodness of a function:
โข The mean ๐ and covariance ฮฃ that maximizing the likelihood (the probability of generating data)
โข Find the best function: easy
Three Steps
๐ ๐ถ1|๐ฅ =๐ ๐ฅ|๐ถ1 ๐ ๐ถ1
๐ ๐ฅ|๐ถ1 ๐ ๐ถ1 + ๐ ๐ฅ|๐ถ2 ๐ ๐ถ2๐ฅ
If ๐ ๐ถ1|๐ฅ > 0.5, output: class 1
Otherwise, output: class 2
Probability Distribution
โข You can always use the distribution you like
๐ ๐ฅ|๐ถ1
If you assume all the dimensions are independent,
then you are using Naive Bayes Classifier.
๐ฅ1๐ฅ2โฎ๐ฅ๐โฎ๐ฅ๐พ
= ๐ ๐ฅ1|๐ถ1 ๐ ๐ฅ2|๐ถ1 ๐ ๐ฅ๐|๐ถ1โฆโฆ โฆโฆ
1-D Gaussian
For binary features, you may assume they are from Bernoulli distributions.
Posterior Probability
๐ ๐ถ1|๐ฅ =๐ ๐ฅ|๐ถ1 ๐ ๐ถ1
๐ ๐ฅ|๐ถ1 ๐ ๐ถ1 + ๐ ๐ฅ|๐ถ2 ๐ ๐ถ2
=1
1 +๐ ๐ฅ|๐ถ2 ๐ ๐ถ2๐ ๐ฅ|๐ถ1 ๐ ๐ถ1
=1
1 + ๐๐ฅ๐ โ๐ง
๐ง = ๐๐๐ ๐ฅ|๐ถ1 ๐ ๐ถ1๐ ๐ฅ|๐ถ2 ๐ ๐ถ2
= ๐ ๐งSigmoid function
z
z
Warning of Math
Posterior Probability
๐ ๐ถ1|๐ฅ = ๐ ๐ง ๐ง = ๐๐๐ ๐ฅ|๐ถ1 ๐ ๐ถ1๐ ๐ฅ|๐ถ2 ๐ ๐ถ2
๐ ๐ฅ|๐ถ1 =1
2๐ ๐ท/2
1
ฮฃ1 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1
๐ ๐ฅ|๐ถ2 =1
2๐ ๐ท/2
1
ฮฃ2 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ ๐2
๐ง = ๐๐๐ ๐ฅ|๐ถ1๐ ๐ฅ|๐ถ2
+ ๐๐๐ ๐ถ1๐ ๐ถ2
sigmoid
๐1๐1 +๐2๐2
๐1 +๐2
=๐1๐2
= ๐๐ฮฃ2 1/2
ฮฃ1 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1 โ ๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ
๐๐
12๐ ๐ท/2
1ฮฃ1 1/2 ๐๐ฅ๐ โ
12๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1
12๐ ๐ท/2
1ฮฃ2 1/2 ๐๐ฅ๐ โ
12๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ ๐2
= ๐๐ฮฃ2 1/2
ฮฃ1 1/2โ1
2๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1 โ ๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ ๐2
๐ ๐ฅ|๐ถ1 =1
2๐ ๐ท/2
1
ฮฃ1 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1
๐ ๐ฅ|๐ถ2 =1
2๐ ๐ท/2
1
ฮฃ2 1/2๐๐ฅ๐ โ
1
2๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ ๐2
๐ง = ๐๐๐ ๐ฅ|๐ถ1๐ ๐ฅ|๐ถ2
+ ๐๐๐ ๐ถ1๐ ๐ถ2 ๐1
๐2
๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1
= ๐ฅ๐ ฮฃ1 โ1๐ฅ โ 2 ๐1 ๐ ฮฃ1 โ1๐ฅ + ๐1 ๐ ฮฃ1 โ1๐1
๐ง = ๐๐ฮฃ2 1/2
ฮฃ1 1/2 โ1
2๐ฅ๐ ฮฃ1 โ1๐ฅ + ๐1 ๐ ฮฃ1 โ1๐ฅ โ
1
2๐1 ๐ ฮฃ1 โ1๐1
+1
2๐ฅ๐ ฮฃ2 โ1๐ฅ โ ๐2 ๐ ฮฃ2 โ1๐ฅ +
1
2๐2 ๐ ฮฃ2 โ1๐2
= ๐๐ฮฃ2 1/2
ฮฃ1 1/2โ1
2๐ฅ โ ๐1 ๐ ฮฃ1 โ1 ๐ฅ โ ๐1 โ ๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ ๐2
๐ง = ๐๐๐ ๐ฅ|๐ถ1๐ ๐ฅ|๐ถ2
+ ๐๐๐ ๐ถ1๐ ๐ถ2
=๐1๐2
= ๐ฅ๐ ฮฃ1 โ1๐ฅ โ ๐ฅ๐ ฮฃ1 โ1๐1 โ ๐1 ๐ ฮฃ1 โ1๐ฅ + ๐1 ๐ ฮฃ1 โ1๐1
๐ฅ โ ๐2 ๐ ฮฃ2 โ1 ๐ฅ โ ๐2
= ๐ฅ๐ ฮฃ2 โ1๐ฅ โ 2 ๐2 ๐ ฮฃ2 โ1๐ฅ + ๐2 ๐ ฮฃ2 โ1๐2
+๐๐๐1๐2
End of Warning
๐ ๐ถ1|๐ฅ = ๐ ๐ง
ฮฃ1 = ฮฃ2 = ฮฃ
๐ง = ๐1 โ ๐2 ๐ฮฃโ1๐ฅ โ1
2๐1 ๐ฮฃโ1๐1 +
1
2๐2 ๐ฮฃโ1๐2 + ๐๐
๐1๐2
๐๐ปb
๐ ๐ถ1|๐ฅ = ๐ ๐ค โ ๐ฅ + ๐
๐ง = ๐๐ฮฃ2 1/2
ฮฃ1 1/2 โ1
2๐ฅ๐ ฮฃ1 โ1๐ฅ + ๐1 ๐ ฮฃ1 โ1๐ฅ โ
1
2๐1 ๐ ฮฃ1 โ1๐1
+1
2๐ฅ๐ ฮฃ2 โ1๐ฅ โ ๐2 ๐ ฮฃ2 โ1๐ฅ +
1
2๐2 ๐ ฮฃ2 โ1๐2 +๐๐
๐1๐2
In generative model, we estimate ๐1, ๐2, ๐1, ๐2, ฮฃ
Then we have w and b
How about directly find w and b?
Reference
โข Bishop: Chapter 4.1 โ 4.2
โข Data: https://www.kaggle.com/abcsds/pokemon
โข Useful posts:โข https://www.kaggle.com/nishantbhadauria/d/abcsds/po
kemon/pokemon-speed-attack-hp-defense-analysis-by-type
โข https://www.kaggle.com/nikos90/d/abcsds/pokemon/mastering-pokebars/discussion
โข https://www.kaggle.com/ndrewgele/d/abcsds/pokemon/visualizing-pok-mon-stats-with-seaborn/discussion