Introduction to Machine...
Transcript of Introduction to Machine...
![Page 1: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/1.jpg)
Probabilistic graphical modelsYifeng Tao
School of Computer ScienceCarnegie Mellon University
Slides adapted from Eric Xing, Matt Gormley
Carnegie Mellon University 1Yifeng Tao
Introduction to Machine Learning
![Page 2: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/2.jpg)
Recap of Basic Probability Concepts
oRepresentation: the joint probability distribution on multiple binary variables?
oState configurations in total: 28
oAre they all needed to be represented? oDo we get any scientific/medical insight?
oLearning: where do we get all this probabilities? oMaximal-likelihood estimation?
oInference: If not all variables are observable, how to compute the conditional distribution of latent variables given evidence?oComputing p(H|A) would require summing over all 26 configurations of the
unobserved variables
Yifeng Tao Carnegie Mellon University 2
[Slide from EricXing.]
![Page 3: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/3.jpg)
Graphical Model: Structure Simplifies Representation
oDependencies among variables
Yifeng Tao Carnegie Mellon University 3
[Slide from EricXing.]
![Page 4: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/4.jpg)
Probabilistic Graphical Models
oIf Xi’s are conditionally independent (as described by a PGM), the joint can be factored to a product of simpler terms, e.g.,
oWhy we may favor a PGM?o Incorporation of domain knowledge and causal (logical) structureso2+2+4+4+4+8+4+8=36, an 8-fold reduction from 28 in representation cost!
Yifeng Tao Carnegie Mellon University 4
[Slide from EricXing.]
![Page 5: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/5.jpg)
Two types of GMs
oDirected edges give causality relationships (Bayesian Network or Directed Graphical Model):
oUndirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model):
Yifeng Tao Carnegie Mellon University 5
[Slide from EricXing.]
![Page 6: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/6.jpg)
Bayesian Network
oDefinition:
oIt consists of a graph G and the conditional probabilities P oThese two parts full specify the distribution:
oQualitative Specification: GoQuantitative Specification: P
Yifeng Tao Carnegie Mellon University 6
[Slide from EricXing.]
![Page 7: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/7.jpg)
Where does the qualitative specification come from?
oPrior knowledge of causal relationships oLearning from data (i.e. structure learning) oWe simply prefer a certain architecture (e.g. a layered graph) o…
Yifeng Tao Carnegie Mellon University 7
[Slide from MattGormley.]
![Page 8: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/8.jpg)
Quantitative Specification
oExample: Conditional probability tables (CPTs) for discrete random variables
Yifeng Tao Carnegie Mellon University 8
[Slide from EricXing.]
![Page 9: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/9.jpg)
Quantitative Specification
oExample: Conditional probability density functions (CPDs) for continuous random variables
Yifeng Tao Carnegie Mellon University 9
[Slide from EricXing.]
![Page 10: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/10.jpg)
Observed Variables
oIn a graphical model, shaded nodes are “observed”, i.e. their values are given
Yifeng Tao Carnegie Mellon University 10
[Slide from MattGormley.]
![Page 11: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/11.jpg)
GMs are your old friends
oDensity estimation oParametric and nonparametric methods
oRegression oLinear, conditional mixture, nonparametric
oClassification oGenerative and discriminative approach
oClustering
Yifeng Tao Carnegie Mellon University 11
[Slide from EricXing.]
![Page 12: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/12.jpg)
What Independencies does a Bayes Net Model?
oIndependency of X and Z given Y?P(X|Y)P(Z|Y) = P(X,Z|Y)
oThree cases of interest...oProof?
Yifeng Tao Carnegie Mellon University 12
[Slide from MattGormley.]
![Page 13: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/13.jpg)
The “Burglar Alarm” example
oYour house has a twitchy burglar alarm that is also sometimes triggered by earthquakes.
oEarth arguably doesn’t care whether your house is currently being burgled.
oWhile you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing.
Yifeng Tao Carnegie Mellon University 13
[Slide from MattGormley.]
![Page 14: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/14.jpg)
Markov Blanket
oDef: the co-parents of a node are the parents of its children
oDef: the Markov Blanket of a node is the set containing the node’s parents, children, and co-parents.
oThm: a node is conditionally independent of every other node in the graph given its Markov blanket
oExample: The Markov Blanket of X6 is{X3, X4, X5, X8, X9, X10}
Yifeng Tao Carnegie Mellon University 14
[Slide from MattGormley.]
![Page 15: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/15.jpg)
Markov Blanket
oExample: The Markov Blanket of X6 is{X3, X4, X5, X8, X9, X10}
Yifeng Tao Carnegie Mellon University 15
[Slide from MattGormley.]
![Page 16: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/16.jpg)
D-Separation
oThm: If variables X and Z are d-separated given a set of variables EThen X and Z are conditionally independent given the set E
oDefinition:oVariables X and Z are d-separated given a set of evidence variables E
iff every path from X to Z is “blocked”.
Yifeng Tao Carnegie Mellon University 16
[Slide from MattGormley.]
![Page 17: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/17.jpg)
D-SeparationoVariables X and Z are d-separated
given a set of evidence variables Eiff every path from X to Z is “blocked”.
Yifeng Tao Carnegie Mellon University 17
[Slide from EricXing.]
![Page 18: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/18.jpg)
Machine Learning
Yifeng Tao Carnegie Mellon University 18
[Slide from MattGormley.]
![Page 19: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/19.jpg)
Recipe for Closed-form MLE
Yifeng Tao Carnegie Mellon University 19
[Slide from MattGormley.]
![Page 20: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/20.jpg)
Learning Fully Observed BNs
oHow do we learn these conditional and marginal distributions for a Bayes Net?
Yifeng Tao Carnegie Mellon University 20
[Slide from MattGormley.]
![Page 21: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/21.jpg)
Learning Fully Observed BNs
oLearning this fully observed Bayesian Network is equivalent to learning five (small / simple) independent networks from the same data
Yifeng Tao Carnegie Mellon University 21
[Slide from MattGormley.]
![Page 22: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/22.jpg)
Learning Fully Observed BNs
Yifeng Tao Carnegie Mellon University 22
[Slide from MattGormley.]
![Page 23: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/23.jpg)
Learning Partially Observed BNs
oPartially Observed Bayesian Network:oMaximal likelihood estimation à Incomplete log-likelihoodoThe log-likelihood contains unobserved latent variables
oSolve with EM algorithmoExample: Gaussian Mixture Models (GMMs)
Yifeng Tao Carnegie Mellon University 23
[Slide from EricXing.]
![Page 24: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/24.jpg)
Inference of BNs
oSuppose we already have the parameters of a Bayesian Network...
Yifeng Tao Carnegie Mellon University 24
[Slide from MattGormley.]
![Page 25: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/25.jpg)
Approaches to inference
oExact inference algorithmsoThe elimination algorithm à Message PassingoBelief propagationoThe junction tree algorithms
oApproximate inference techniques oVariational algorithms oStochastic simulation / sampling methods oMarkov chain Monte Carlo methods
Yifeng Tao Carnegie Mellon University 25
[Slide from EricXing.]
![Page 26: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/26.jpg)
Marginalization and Elimination
Yifeng Tao Carnegie Mellon University 26
[Slide from EricXing.]
![Page 27: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/27.jpg)
Marginalization and Elimination
Yifeng Tao Carnegie Mellon University 27
[Slide from EricXing.]
![Page 28: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/28.jpg)
Yifeng Tao Carnegie Mellon University 28
[Slide from EricXing.]
![Page 29: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/29.jpg)
oStep 8: Wrap-up
Yifeng Tao Carnegie Mellon University 29
[Slide from EricXing.]
![Page 30: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/30.jpg)
Elimination algorithm
oElimination on trees is equivalent to message passing on branchesoMessage-passing is consistent in trees
oApplication: HMM
Yifeng Tao Carnegie Mellon University 30
[Slide from EricXing.]
![Page 31: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/31.jpg)
Gibbs Sampling
Yifeng Tao Carnegie Mellon University 31
[Slide from MattGormley.]
![Page 32: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/32.jpg)
Gibbs Sampling
Yifeng Tao Carnegie Mellon University 32
[Slide from MattGormley.]
![Page 33: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/33.jpg)
Gibbs Sampling
Yifeng Tao Carnegie Mellon University 33
[Slide from MattGormley.]
![Page 34: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/34.jpg)
Gibbs Sampling
oFull conditionals only need to condition on the Markov Blanket
oMust be “easy” to sample from conditionals
oMany conditionals are log-concave and are amenable to adaptive rejection sampling
Yifeng Tao Carnegie Mellon University 34
[Slide from MattGormley.]
![Page 35: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/35.jpg)
Take home message
oGraphical models portrays the sparse dependencies of variablesoTwo types of graphical models: Bayesian network and Markov
random fieldoConditional independence, Markov blanket, and d-separationoLearning fully observed and partially observed Bayesian networksoExact inference and approximate inference of Bayesian networks
Carnegie Mellon University 35Yifeng Tao
![Page 36: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:](https://reader036.fdocuments.in/reader036/viewer/2022062609/60fcfac0a735801de847a68a/html5/thumbnails/36.jpg)
References
oEric Xing, Ziv Bar-Joseph. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701/
oMatt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html
Carnegie Mellon University 36Yifeng Tao