Pattern Recognition and Machine Learning : Graphical...
Transcript of Pattern Recognition and Machine Learning : Graphical...
![Page 1: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/1.jpg)
C. M. Bishop
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I
![Page 2: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/2.jpg)
BCS Summer School, Exeter, 2003 Christopher M. Bishop
Probabilistic Graphical Models
• Graphical representation of a probabilistic model
• Each variable corresponds to a node in the graph
• Links in the graph denote probabilistic relations between
variables
![Page 3: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/3.jpg)
Why do we need graphical models?
• Graphs are an intuitive way of representing and visualising the
relationships between many variables. (Examples: family trees,
electric circuit diagrams, neural networks)
• A graph allows us to abstract out the conditional independence
relationships between the variables from the details of their
parametric forms. Thus we can ask questions like: “Is A dependent
on B given that we know the value of C ?” just by looking at the
graph.
• Graphical models allow us to define general message-passing
algorithms that implement Bayesian inference efficiently. Thus we
can answer queries like “What is P(A|C = c)?” without enumerating
all settings of all variables in the model.
Zoubin Ghahramani
![Page 4: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/4.jpg)
Three kinds of Graphical Models
image by Zoubin Ghahramani
or Bayesian Networks
useful to express causal
relationships between
variables
or Markov random fields
useful to express soft
constraints between
variables
convenient for
solving inference
problems
![Page 5: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/5.jpg)
Bayesian Networks
Directed Acyclic Graph (DAG)
Note: the left-hand side is symmetrical w/r to the variables whereas the right-
hand side is not.
![Page 6: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/6.jpg)
Bayesian Networks
Generalization to K variables:
The associated graph is fully connected.
The absence of links conveys important information.
![Page 7: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/7.jpg)
Bayesian Networks
General Factorization
![Page 8: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/8.jpg)
Some Notations(1)
Plate
N
n
ntppp1
)w|()w()w,t(
More compact
representation
![Page 9: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/9.jpg)
Some Notations(2)
Condition on data
Shaded to indicate that the r.v. is set to its observed value
![Page 10: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/10.jpg)
Generative Models
Causal process for generating images
![Page 11: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/11.jpg)
PPCA
nx
nz
N
W
2
![Page 12: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/12.jpg)
Discrete Variables
Denote the probability of observing both and
by
General joint distribution: K 2 { 1 parameters
K states K states
11 kx
12 lx 1 , klk lkl
)x()x|x()x,x( 11221 ppp
K -1 parameters
K -1 parameters
K2-1 parameters
![Page 13: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/13.jpg)
Discrete Variables
Independent joint distribution: 2(K { 1) parameters
![Page 14: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/14.jpg)
Discrete Variables
General joint distribution over M variables: KM { 1 parameters
M -node Markov chain: K { 1 + (M { 1) K(K { 1) parameters
Reduce the number of parameters by dropping link in the graph, at the
expense of having a restricted class of distributions.
)x( 1pMipp iii ,...,2),x()x|x( 11
![Page 15: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/15.jpg)
Conditional Independence
a is independent of b given c
Equivalently
Notation
![Page 16: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/16.jpg)
Conditional Independence: Example 1
marginalize w.r.t c
as it doesn‟t factorize into p(a)p(b) in
general
tail-to-tail
![Page 17: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/17.jpg)
Conditional Independence: Example 1
tail-to-tail
![Page 18: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/18.jpg)
Conditional Independence: Example 2
marginalize w.r.t c
head-to-tail
![Page 19: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/19.jpg)
Conditional Independence: Example 2
head-to-tail
![Page 20: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/20.jpg)
Conditional Independence: Example 3
Note: this is the opposite of Example 1 and 2, with c unobserved.
marginalize w.r.t c
head-to-head
![Page 21: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/21.jpg)
Conditional Independence: Example 3
Note: this is the opposite of Example 1 and 2, with c observed.
head-to-head
![Page 22: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/22.jpg)
“Am I out of fuel?”
B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full) G = Fuel Gauge Reading (0=empty, 1=full)
and hence
![Page 23: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/23.jpg)
“Am I out of fuel?”
Probability of an empty tank increased by observing G = 0.
![Page 24: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/24.jpg)
“Am I out of fuel?”
Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.
the state of the fuel tank and that of the
battery have become dependent on each
other as a result of observing the reading
on the fuel gauge.
![Page 25: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/25.jpg)
D-separation • A, B, and C are non-intersecting subsets of nodes in a
directed graph. • A path from A to B is blocked if it contains a node such that
either a) the arrows on the path meet either head-to-tail or tail-
to-tail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and
neither the node, nor any of its descendants, are in the set C.
• If all paths from A to B are blocked, A is said to be d-separated from B by C.
• If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .
![Page 26: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/26.jpg)
D-separation: Example
![Page 27: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/27.jpg)
D-separation: I.I.D. Data
![Page 28: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/28.jpg)
The Markov Blanket
Markov Blanket (remaining factors)
Parents and children of xi
Co-parents: parents of children of xi
due to „explaining away‟ phenomenon
![Page 29: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/29.jpg)
Factorization Properties
![Page 30: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/30.jpg)
Cliques and Maximal Cliques
Clique
Maximal Clique
Thus, if {x1, x2, x3} is a maximal clique and we define an arbitrary function over this clique, then including another factor defined over a subset of these variables would be redundant.
![Page 31: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/31.jpg)
Notations: C – max. clique, - the set of variables in that clique.
Joint Distribution
where is the potential over clique C , which is a non-negative function
which measures “compatibility” between settings of the variables.
is the partition function (used for normalization).
note: M K-state variables KM terms in Z.
![Page 32: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/32.jpg)
Hammersley and Clifford Theorem
• UI: the set of distributions that are consistent with the set of conditional independence statements read from the graph using graph separation.
• UF: the set of distributions that can be expressed as a factorization described with respect to the maximal cliques.
• The Hammersley-Clifford theorem states that the sets UI and UF are identical if is strictly positive.
• In such case
where is called an energy function, and the exponential representation is called the Boltzmann distribution.
![Page 33: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/33.jpg)
Illustration: Image De-Noising using MRF
Original Image Noisy Image
Noisy image is obtained by randomly flipping the sign of pixels in the
Noise free image with some small probability.
The Goal: Given Noisy image recover Noise Free Image
![Page 34: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/34.jpg)
Markov Model
strong correlation between xi and yi.
neighbouring pixels xi and xj in an image are
strongly correlated.
![Page 35: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/35.jpg)
Markov Model
biasing the model towards pixel
values that have one particular
sign in preference to the other.
![Page 36: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/36.jpg)
The joint distribution IC
M -
ite
rate
d c
onditio
nal m
odes
having a high probability
![Page 37: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/37.jpg)
Illustration: Image De-Noising (3)
Noisy Image Restored Image (ICM)
![Page 38: Pattern Recognition and Machine Learning : Graphical …rita/uml_course/lectures/prml8_update_p1.pdf · c. m. bishop pattern recognition and machine learning chapter 8: graphical](https://reader031.fdocuments.in/reader031/viewer/2022021819/5abe6df57f8b9a8e3f8cf78c/html5/thumbnails/38.jpg)
Illustration: Image De-Noising (4)
Restored Image (Graph cuts) Restored Image (ICM)