Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller
description
Transcript of Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller
![Page 1: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
Advisor : Dr.Hsu
Graduate : Keng-Wei Chang
Author : Lian Yan and David J. Miller
國立雲林科技大學National Yunlin University of Science and Technology
General statistical inference for discrete and
mixed spaces by an approximate application
of the maximum entropy principle
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
![Page 2: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
Outline
Motivation Objective Introduction Maximum Entropy Joint PMF Extensions for More General Inference Problems Experimental Results Conclusions and Possible Extensions
N.Y.U.S.T.
I.M.
![Page 3: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
Motivation maximum entropy (ME) joint probability mass
function (pmf) powerful and not require expression of conditional
independence the huge learning complexity has severely limited
the use of this approach
N.Y.U.S.T.
I.M.
![Page 4: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
Objective propose an approach can quite tractable
learning extend to with mixed data
N.Y.U.S.T.
I.M.
![Page 5: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
1. Introduction probability mass function (pmf) joint pmf, can compute a posteriori probabilities
for a single, fixed feature given knowledge of the remaining feature values statistical classification
with some feature values missing statistical classification for any (e.g., user-specified) discrete feature dimensions given
values for the other features generalized classification
N.Y.U.S.T.
I.M.
![Page 6: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
1. Introduction Multiple Networks Approach Bayesian Networks Maximum Entropy Models Advantages of the Proposed ME Method ove
r BN’s
N.Y.U.S.T.
I.M.
![Page 7: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
1.1 Multiple Networks Approach
multilayer perceptrons (MLP’s), radial basis functions, support vector machines one would train one network for each feature example: documents classification to multiple topics one network was used to make an individual yes/no decision for presence of each possible topic multiple networks approach
N.Y.U.S.T.
I.M.
![Page 8: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
1.1 Multiple Networks Approach
several potential difficulties increased learning and storage complexities accuracy of inferences
ignores dependencies between features example:
network predict F1 = 1 and F2 = 1 respectively
but the joint event (F1=1, F2=1) has zero probability
N.Y.U.S.T.
I.M.
![Page 9: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
1.2 Bayesian Networks
handles missing features and captures dependencies between the multiple features
joint pmf explicitly a product of conditional probability versatile tools for inference that have a convenient, i
nformative representation…
N.Y.U.S.T.
I.M.
![Page 10: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
1.2 Bayesian Networks
several difficulties with BN explicitly conditional independence relations betwee
n features optimizing over the set of possible BN structures
sequential, greedy methods may be suboptimal sequential learning where to stop to avoid overfitting
N.Y.U.S.T.
I.M.
![Page 11: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
1.3 Maximum Entropy Models
Cheeseman proposed maximum entropy (ME) joint pmf consistent with ar
bitrary lower order probability constraints powerful, allowing joint pmf to express general depe
ndencies between features
N.Y.U.S.T.
I.M.
![Page 12: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
1.3 Maximum Entropy Models
several difficulties with ME difficult learning for estimating the ME
Ku and Kullback proposed an iterative algorithm, satisfies one constraint at a time, but cause violation of others
they only presented results for dimension N = 4 and J = 2 discrete values per feature
Peral cites complexity as the main barriers to using ME
N.Y.U.S.T.
I.M.
![Page 13: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/13.jpg)
Intelligent Database Systems Lab
1.4 Advantages of the Proposed ME Method over BN’s
our approach not requir explicit conditional independence an effective joint optimization learning technique
N.Y.U.S.T.
I.M.
![Page 14: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/14.jpg)
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMFN.Y.U.S.T.
I.M.
a random feature vector
full discrete feature space
|}|,...,3,2,1{ and ),,...,,( 21 iiiiN AAAFFFFF
NAAA ...21
![Page 15: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/15.jpg)
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF pairwise pmf
constrain the joint pmf to agree with
the ME joint pmf consistent with these pairwise pmf’s has the Gibbs form
N.Y.U.S.T.
I.M.
},],,[{ mnmFFP nm
][FP
Lagrange multiplier
![Page 16: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/16.jpg)
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF Lagrange multiplier
equality constraint on the individual pairwise probability
the joint pmf is specified by the set of Lagrange multipliers
these probabilities also depend on Γ, they can often be tractably computed
N.Y.U.S.T.
I.M.
],[ nnmm fFfFP
] ,,),,({ nnmmnmnnmm AfAfmfFfF
![Page 17: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/17.jpg)
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF two major difficulties
optimization requires calculating intractable cost D will require marginalizations over the joint pm
f intractable approximate ME was inspired
N.Y.U.S.T.
I.M.
][ fP
![Page 18: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/18.jpg)
Intelligent Database Systems Lab
2.1 Review of the ME Formulation for Classification
random feature vector still has intractable form (1) classification does require computing
but rather just the a posteriori probabilities
N.Y.U.S.T.
I.M.
},...,2,1{ ),,(~
KCCFF
][~
FP
still not feasible!
![Page 19: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/19.jpg)
Intelligent Database Systems Lab
2.1 Review of the ME Formulation for Classification
here we review a tractable , approximate method Joint PMF Form Support Approximation Lagrangian Formulation
N.Y.U.S.T.
I.M.
![Page 20: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/20.jpg)
Intelligent Database Systems Lab
2.1.1 Joint PMF Form
via Bayes rule
where
N.Y.U.S.T.
I.M.
}...],[{ 21 NAAAffP
![Page 21: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/21.jpg)
Intelligent Database Systems Lab
2.1.2 Support Approximation
the approximation may have some effect on accuracy of the learned model
but will not sacrifice our capability full feature space subset computationally feasible example:
N =19 40 billion 100 reduction is huge
N.Y.U.S.T.
I.M.
)},({ cCfF ii
NAAA ...21
![Page 22: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/22.jpg)
Intelligent Database Systems Lab
2.1.3 Lagrangian Formulation
i.e.,
then the joint entropy for
N.Y.U.S.T.
I.M.
|} |,...,1),,...,,({ )()(2
)(1 mffff m
Nmm
m
F
![Page 23: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/23.jpg)
Intelligent Database Systems Lab
2.1.3 Lagrangian Formulation
suggest the cross entropy
the cross entropy/Kullback distance
N.Y.U.S.T.
I.M.
![Page 24: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/24.jpg)
Intelligent Database Systems Lab
2.1.3 Lagrangian Formulation
For pairwise constraints involving the class label P[Fk, C]
N.Y.U.S.T.
I.M.
![Page 25: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/25.jpg)
Intelligent Database Systems Lab
2.1.3 Lagrangian Formulation
overall constraint cost D is formed as a sum of all the individual pairwise costs
given D and H, can form the Lagrangian cost function
N.Y.U.S.T.
I.M.
![Page 26: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/26.jpg)
Intelligent Database Systems Lab
3. Extensions for More General Inference Problems
General statistical Inference Joint PMF Representation Support Approximation Lagrangian Formulatoin
Discussion Mixed Discrete and Continuous Feature Space
N.Y.U.S.T.
I.M.
![Page 27: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/27.jpg)
Intelligent Database Systems Lab
3.1.1 Joint PMF Representation
the posteriori probabilities have
N.Y.U.S.T.
I.M.
![Page 28: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/28.jpg)
Intelligent Database Systems Lab
3.1.1 Joint PMF Representation
respect to each feature Fi, the joint pmf as
N.Y.U.S.T.
I.M.
![Page 29: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/29.jpg)
Intelligent Database Systems Lab
3.1.2 Support Approximation
reduced joint pmf for
if there is a set
N.Y.U.S.T.
I.M.
}: {)( )()()( iimm
i ffffS
)( iF
![Page 30: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/30.jpg)
Intelligent Database Systems Lab
3.1.3 Lagrangian Formulatoin
the joint entropy H can be written
N.Y.U.S.T.
I.M.
![Page 31: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/31.jpg)
Intelligent Database Systems Lab
3.1.3 Lagrangian Formulatoin
pairwise pmf PM[Fk, Fl] can be calculated in two different ways
and
N.Y.U.S.T.
I.M.
![Page 32: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/32.jpg)
Intelligent Database Systems Lab
3.1.3 Lagrangian Formulatoin
overall constraint cost D
N.Y.U.S.T.
I.M.
![Page 33: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/33.jpg)
Intelligent Database Systems Lab
3.1.3 Lagrangian FormulatoinN.Y.U.S.T.
I.M.
![Page 34: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/34.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
![Page 35: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/35.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
![Page 36: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/36.jpg)
Intelligent Database Systems Lab
3.2. Discussion
Choice of Constraints encode all probabilities of second order
Tractability of Learning Qualitative Comparison of Methods
N.Y.U.S.T.
I.M.
![Page 37: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/37.jpg)
Intelligent Database Systems Lab
3.3. Mixed Discrete and Continuous Feature Space
feature vector will be written
our objective is to learn
N.Y.U.S.T.
I.M.
),( AF
),...,,(
),...,,(
21
21
c
d
N
N
AAAA
FFFF
]},|[{ afcP
![Page 38: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/38.jpg)
Intelligent Database Systems Lab
3.3. Mixed Discrete and Continuous Feature Space
given our choice of constraints, these probabilities
decompose the joint density as
N.Y.U.S.T.
I.M.
![Page 39: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/39.jpg)
Intelligent Database Systems Lab
3.3. Mixed Discrete and Continuous Feature Space
a conditional mean constraint on Ai given C = c
a pair of continuous features Ai, Aj
N.Y.U.S.T.
I.M.
![Page 40: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/40.jpg)
Intelligent Database Systems Lab
4. Experiment Results
Evaluation of generalized classification performance used solely for classification Mushroom, Congress, Nursery, Zoo, Hepatitis
Generalized classification performance on data sets indicates multiple possible class features Solar Flare, Flag, Horse Colic
Classification performance on data sets with mixed continuous and discrete features Credit Approval, Hepatitis, Horse Colic
N.Y.U.S.T.
I.M.
![Page 41: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/41.jpg)
Intelligent Database Systems Lab
4. Experiment Results
the ME method was compared with BN DT powerful extension of DT mixtures of DT multilayer perceptrons (MLP)
N.Y.U.S.T.
I.M.
![Page 42: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/42.jpg)
Intelligent Database Systems Lab
4. Experiment Results
for a arbitrary feature to be inrerred, Fi, computes the a posteriori probabilities
N.Y.U.S.T.
I.M.
![Page 43: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/43.jpg)
Intelligent Database Systems Lab
use the following criteria to evaluate all the methods(1) misclassification rate on the test set for the data set’s
class label
(2) (1) with a single feature missing randomly
(3) average misclassification rate on the test set
(4) misclassification rate on the test set, based on
predicting a pair of randomly chosen features
N.Y.U.S.T.
I.M.
4. Experiment Results
![Page 44: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/44.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
![Page 45: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/45.jpg)
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
![Page 46: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/46.jpg)
Intelligent Database Systems Lab
4. Experiment Results N.Y.U.S.T.
I.M.
![Page 47: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/47.jpg)
Intelligent Database Systems Lab
5. Conclusions and Possible Extensions Regression Large-Scale Problems Model Selection-Searching for ME Constraints Applications
N.Y.U.S.T.
I.M.
![Page 48: Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813b7f550346895da4a11d/html5/thumbnails/48.jpg)
Intelligent Database Systems Lab
Personal opinion …
N.Y.U.S.T.
I.M.